In this chapter
On NUMA hosts, multi-CPU distribution of traffic can incur a performance penalty because
only the NUMA node local to a SmartNIC benefits from direct data I/O to the L3 cache of the
NUMA node. An expansion bus enables bonding two SmartNICs, ensuring local cache
access.
Note: This chapter only applies to the following:
- A pair of NT100E3-1-PTP SmartNICs.
- A pair of NT40A01 SmartNICs running on the 4 × 10/1 Gbit/s SLB image.
- A pair of NT200A02 SmartNICs running on the 2 × 40 Gbit/s SLB image or the 8 × 10 Gbit/s SLB image.
Note: The two SmartNICs must be connected by an interconnect cable and a time synchronization
cable.
Note: The NT40A01 SmartNIC running on the 4 × 10/1 Gbit/s SLB image is supported on Linux
only.
QPI bypass concept
On modern Intel architecture NUMA hosts, two technologies have improved I/O performance
significantly:
-
QuickPath Interconnect (QPI) provides fast connections between multiple processors and I/O hubs.
-
Direct Data I/O (DDIO) is an enhancement of DMA that allows data to be transferred directly between a device in a PCIe slot and the L3 cache of the processors local to the PCIe slot.
But writing from a device in a PCIe slot to memory on a remote NUMA node through QPI incurs
latency in several ways:
-
The local L3 cache is polluted by data destined for the remote NUMA node.
-
The QPI itself causes latency.
-
The QPI memory write causes a flush of remote L3 cache lines (enforced by the cache coherency protocol).
To bypass the QPI and ensure local cache access, data destined for a remote NUMA node must
be transferred to the SmartNIC local to that NUMA node before the PCIe bus. Some Napatech
SmartNICs enable this through an expansion bus that allows two SmartNICs to be
interconnected:
- A pair of NT100E3-1-PTPs can be bonded as peers so that streams can be redirected to the peer SmartNIC.
- A pair of NT40A01 SmartNICs running on the 4 × 10/1 Gbit/s SLB image or a pair of NT200A02 SmartNICs running on the 2 × 40 Gbit/s SLB image or the 8 × 10 Gbit/s SLB image can be bonded in a master-slave configuration so that all frames received on the master SmartNIC is copied to the slave SmartNIC.