Receive Side Scaling (RSS)

Link-Inline™ Software Features

Platform
Napatech SmartNIC
Content Type
Feature Description
Capture Software Version
Link-Inline™ Software 3.0

When frames are forwarded to the host, the SmartNIC can distribute these frames across multiple RX queues using the receive side scaling (RSS) functionality which enables processing traffic in multiple CPU cores.

Flow-aware load distribution

The SmartNIC can distribute traffic to a maximum of 128 queues. The traffic distribution is done using the Napatech hashing algorithm based on flow information. The 5-tuple hash function is used to generate hash values by default. The contents of these header fields are used:
  • 32-bit IPv4 / 128-bit IPv6 source address
  • 32-bit IPv4 / 128-bit IPv6 destination address
  • 16-bit UDP, TCP or SCTP source port number
  • 16-bit UDP, TCP or SCTP destination port number
  • 8-bit IPv4/IPv6 protocol number / next header
Note: The 5-tuple hash function is done on the outer layer.

Frames must contain the required protocol fields in order to be valid for the given hash function. It is therefore recommended to add a filter to ensure that the protocol fields used in the hash function are present. Otherwise, the distribution may show an unexpected result.

Available hash functions per port

It is possible to select an alternative hash function per port through the DPDK API using the function rte_eth_dev_rss_hash_update with the rte_eth_rss_conf structure. One of three hash modes can be selected using the DPDK API. This table shows the supported header field of each hash function and corresponding macro combinations for the rte_eth_rss_conf.rss_hf field.
Hash function Macro combination
IPv4 or IPv6 destination address of the outer layer RTE_ETH_RSS_LEVEL_OUTERMOST | RTE_ETH_RSS_L3_DST_ONLY | RTE_ETH_RSS_IP
IPv4 or IPv6 tunneled source address of the first level inner layer RTE_ETH_RSS_LEVEL_INNERMOST | RTE_ETH_RSS_L3_SRC_ONLY | RTE_ETH_RSS_IP
VLAN tag of the first level RTE_ETH_RSS_C_VLAN

In UPF frame processing of the 5G core network, the IPv4 or IPv6 destination address of the outer layer can be used on the port for downlink traffic, and the IPv4 or IPv6 tunneled source address of the first-level inner layer can be used on the port for uplink traffic, ensuring that all frames to and from the same UE is forwarded to the same queue.

Traffic can also be distributed to multiple queues based on VLAN IDs using the VLAN tag of the first level.

Note: It is not possible to change back to the default 5-tuple hash function once it is changed to another hash function using the DPDK API.

Maximum number of memory maps in DPDK

Even though the SmartNIC supports a maximum of 256 queues (128 RX queues + 128 TX queues), the number of supported queues can be limited by the maximum number of memory maps in DPDK, which is set to 256 by default. Out of 256 memory maps available, 2 (or more) are used by internal processes. This implies that DPDK can create 254 or fewer RX and TX queues using the default configuration. If an application attempts to create more memory maps than allowed, it may generate an error. It is possible to modify the maximum number of memory maps in the lib/eal/linux/eal_vfio.c file of the DPDK package. The following line in the file defines the value:
#define VFIO_MAX_USER_MEM_MAPS 256

RX queues for physical ports

If RX queues are configured only for physical ports, the same number of queues are assigned to each port. For the SmartNIC with two physical ports, a maximum of 64 RX queues per port can be used. The number of queues for physical ports can be configured using the rxqs parameter in the DPDK application. See the following DPDK testpmd command example.
./dpdk-testpmd --iova-mode=pa --vfio-vf-token=14d63f20-8445-11ea-8900-1f9ce7d5650d \
-a 0000:65:00.0,rxqs=64 -- -i
Using this command, 64 RX queues are created per port for the SmartNIC with two physical ports..

RX queues for virtual functions

A maximum of 124 RX queues (out of 128 RX queues) are available for virtual functions as 2 queues are used for physical ports, and 2 queues are used for internal processes.

In addition, the maximum number of queues for virtual functions are limited by the maximum number of virtual queue pairs, which is defined in the DPDK drivers/net/virtio/virtio.h file as follows.
#define VIRTIO_MAX_VIRTQUEUE_PAIRS 8
By default, it is set to 8, meaning that DPDK supports a maximum of 8 virtual queue pairs for each virtio device. If you need to change this limit, you can modify the value in the header file to suit your specific requirements. When the limit is adjusted, it must be evaluated and tested as it can impact system performance and resource utilization.

The maximum number of virtual functions are also limited by DPDK RTE_MAX_ETHPORTS, which is set to 32 by default. This means that a maximum of 30 virtual functions can be created as 2 out of 32 are used for physical ports. For example, if queues are evenly distributed to 30 virtual functions, a maximum of 4 queues can be assigned to each virtual function. See Running a user application with the DPDK virtio PMD in DN-1354.