TX Architecture

Napatech Link™ Capture Software Features

Platform
Intel® PAC
Napatech SmartNIC
Content Type
Feature Description
Capture Software Version
Link™ Capture Software 12.7

Transmission features

The transmission functionality encompasses these features:

  • Full line rate host-based transmission
  • Up to 128 host buffers/streams supported per SmartNIC
  • Traffic merge from TX buffers done in the FPGA

TX architecture overview

This figure illustrates a setup with a TX application.

Page-1 Rectangle.24 CPU + CACHE CPU + CACHE Rectangle.2 CORE0 CORE0 Rectangle.26 CORE1 CORE1 Rectangle.27 CORE2 CORE2 Rectangle.28 COREN - 1 COREN - 1 Rectangle.29 3GD DRIVER 3GD DRIVER Sheet.7 Sheet.8 Sheet.9 Sheet.10 Rectangle.34 NTAPI NTSERVICE + LIBNTOS NTAPINTSERVICE + LIBNTOS Rectangle.40 IIO IIO Rectangle.44 MEMORY CONTROLLER MEMORY CONTROLLER Rectangle.48 NAPATECH ACCELERATOR NAPATECHSMARTNIC Sheet.15 Sheet.16 PCIe3 PCIe3 Rectangle.51 APPLICATION THREAD (USER) APPLICATION THREAD (USER) Sheet.18 Sheet.19 Rectangle.30 DDR MEMORY DDR MEMORY Sheet.21 Rectangle.35 Sheet.23 Sheet.24 Rectangle.43 HOST BUFFER HOST BUFFER Sheet.26 Rectangle.58 NETWORK STREAM NETWORK STREAM Sheet.28 Sheet.29 Rectangle.61 USER SPACE USER SPACE Rectangle.62 KERNEL SPACE KERNEL SPACE Rectangle.63 HARDWARE HARDWARE Sheet.33

The application writes frames to network streams, which map to TX host buffers. Data is transferred to the SmartNIC via PCIe using DMA for transmission on the specified port.

Transmission scenarios

Frames can be transmitted in different ways:

  • On a single port from a single TX host buffer
  • On a single port from multiple TX host buffers
  • On multiple ports from a single TX host buffer
  • On multiple ports on a single SmartNIC from a single TX host buffer depending on the txPort field in the standard packet descriptor of the individual frames (dynamic transmission – see Dynamic transmission configuration)

The transmission mode can be configured per stream.

Dynamic transmission configuration

By default one TX host buffer is allocated per port that is set in the portMask attribute in a call to one of the NT_NetTXOpen functions, and the txPort field in the standard packet descriptor has no effect.

In order to set up dynamic transmission where the TX port for each individual frame is determined by the txPort field, the NT_NETTX_OPEN_FLAGS_ADAPTER_MULTI_PORT_BUFFER flag must be set before the NT_NetTxOpen_Attr() function is called. The flag is selected in the flags attribute in a call to the NT_NetTxOpenAttrSetFlags() function. This allocates a TX host buffer per SmartNIC, and this host buffer can transmit to all ports on the SmartNIC that are specified in the portMask attribute in the NetTxOpenAttrSetPortMask() function.

Typical TX application process

This is a high-level overview of a typical TX application process:

  1. The application opens a TX stream using the NT_NetTxOpen function.
  2. The application obtains a buffer handle using the NT_NetTxGet function.
  3. The application writes packet data to the buffer as required.
  4. The application releases the buffer to the stream for transmission using the NT_NetTxRelease function.
  5. The application repeats Step 2 to Step 5 as required.
  6. When the transmission is complete, the application closes the stream using the NT_NetTxClose function.

Transmission performance

The high-speed transmission functionality supports transmission at full line rate (except on NT200A02 and NT200A01 running at 2 × 100 Gbit/s) and with low CPU load of all frame sizes from 64 to 10000 bytes. If it is attempted to transmit packets with frame sizes above 10000 bytes, the packets will be discarded.
Note: Sliced and hard-sliced frames, and frames for which there is a mismatch between the wire length and the stored length in the standard packet descriptor are not transmitted.

Frames can be transmitted at the speed supported by the ports, the PCIe interface and the transmission pipe. The table below shows typical transmission speeds of SmartNICs.

Typical transmission speed in Gbits/s SmartNICs
105 NT200A02 running at 2 × 100 Gbit/s
NT200A01 running on the 2 × 100 Gbit/s capture/replay image
100 NT100E3-1-PTP running at 1 × 100 Gbit/s
100/40 NT200A02 running at 4 × 25/10 Gbit/s
NT100A01 running at 4 × 25/10 Gbit/s
80 NT200A02 running at 2 × 40 Gbit/s or 8 × 10 Gbit/s,
NT200A01 running at 2 × 40 Gbit/s or 8 × 10 Gbit/s
NT80E3-2-PTP running at 2 × 40 Gbit/s or 8 × 10 Gbit/s
50/20 NT200A02 running at 2 × 25/10 Gbit/s
NT200A01running at 2 × 25/10 Gbit/s
NT50B01 running at 2 × 25/10 Gbit/s
40 Intel® PAC A10 GX running at 1 × 40 Gbit/s or 4 × 10 Gbit/s
40/4 NT100A01 running at 4 × 10/1 Gbit/s
NT40A01 running at 4 × 10/1 Gbit/s
NT40E3-4-PTP running on the 4 × 10/1 Gbit/s capture/replay image
NT40E3-4-PTP running on the 4 × 10/1 Gbit/s capture/replay 2 image,
20/2 NT200A02 running at 2 × 10/1 Gbit/s
NT50B01 running at 2 × 10/1 Gbit/s
NT20E3-2-PTP running at 2 × 10/1 Gbit/s
4 NT40A01 running at 4 × 1 Gbit/s
TX performance in general (both throughput and latency) strongly depends on:
  • The actual user application
  • The number of host buffers mapped to the individual ports
  • The PCIe bus utilization level and hence the PCIe bus latency
  • General server performance

Transmission configuration

It can be configured on which port a frame is to be transmitted, if it is to be transmitted at all, and when it is transmitted. The transmit on time stamp feature (see Transmit on Time Stamp) allows captured traffic to be transmitted, offset in time but with the same relative distance between the frames as when captured.

For each TX port, the transmission rate can be limited to a specific value or a percentage of link speed.

Ethernet CRC

On NT200A02, NT100A01 and NT50B01 SmartNICs running on a test and measurement image it can be specified per packet using control bits in the dynamic packet descriptors if the frames are to be transmitted with a new correct CRC, a new incorrect CRC (calculated and then changed to be incorrect) or the stored CRC left untouched. Other SmartNICs always generate a new Ethernet CRC for frames to be transmitted.

Layer 3 and layer 4 checksums

NT200A02, NT100A01 and NT50B01 SmartNICs running on a test and measurement image can generate layer 3 (IPv4) and layer 4 (UDP and TCP) checksums for frames to be transmitted. The checksums are controlled by control bits in the dynamic packet descriptors. This is illustrated in the net/checksum code example (see DN-0449).

It can be specified on a per-packet basis if each of the checksums is to be left untouched, to be calculated correctly or to be incorrect (calculated and then changed to be incorrect). UDP checksums can also be specified to be set to zero.

In order for the checksums to be calculated, the packet meta data must contain the layer 3 and 4 offsets. These can be supplied by the FPGA for packets being received by the FPGA, or they must be provided by the applications for packets that have been generated (or modified) by the application. For tunneled packets the application can control which checksums are calculated by the hardware by providing the correct layer 3 and 4 offsets for either the inner or the outer packet. The hardware can only calculate one set of checksums so the application must calculate the other set. As an example the application calculates the outer IPv4 checksum (only header-based) and sets the outer UDP checksum to zero, and the hardware calculates the inner checksums.

Injection of time stamps

A time stamp can be injected into a frame to be transmitted at a specified offset on NT200A02, NT100A01 and NT50B01 SmartNICs running on a test and measurement image.

The injected data consists of the 64-bit time stamp prepended a 2-byte correction value to make the complete 10-byte checksum neutral. The original 10 bytes of frame data must be set to all zeros by the application before the injection, and they must belong to the same layer (fully contained in one checksum domain). The time stamp injection offset must be 2-byte aligned with the start of the data included in the checksum to be recalculated.

The time stamp injection can be controlled per packet by a control bit in the dynamic packet descriptors. This is illustrated in the net/timestamp_inject code example (see DN-0449). Time stamp injection can also be applied globally by using TimestampInjectAlways in the ntservice.ini file (see DN-0449).

The value of the time stamp can be the time when the first byte of the frame is put on the wire, or the time when the last byte is put on the wire, depending on the setting of TimestampMethod in the ntservice.ini file (see DN-0449).

The time stamp injection offset is determined by the setting of TimestampInjectDynamicOffset and TimestampInjectStaticOffset in the ntservice.ini file (see DN-0449). TimestampInjectDynamicOffset can be set to the start of the frame, the end of the frame, the start of the layer 3 header or the start of the layer 4 header. The actual offset is calculated from the location of TimestampInjectDynamicOffset by adding the value of TimestampInjectStaticOffset. If, for instance, TimestampInjectDynamicOffset is set to the end of the frame and TimestampInjectStaticOffset is set to -20, the time stamp is injected 20 bytes before the end of the frame. The combination of the two offsets must result in a location inside the current frame, and there must be enough space for both the prepended 2-byte correction value, the 8-byte time stamp and a 4-byte FCS, that is the resulting offset must be between the start of the frame and 14 bytes before the end of the frame. Otherwise some of the time stamp is overwritten by the FCS, and the checksum will be wrong.

Transmit rate limiting

Using NTAPI, the transmit rate can be limited per port using the NT_ConfigWrite function with the NT_CONFIG_PARM_PORT_SETTINGS_V2 parameter and the NtPortSettings_v2_s structure. Set NtPortSettings_v2_s.txPortRateLimit to the desired limit in bit/s (see DN-0449). A value in the range 1 to 100 is interpreted as a percentage of the link speed.

Using the config tool, apply the --tx_port_rate_limit option with a numeric value (see DN-0449). Without a unit, the value is interpreted as bit/s. A unit can be appended to the value. The units b, K, M and G work as ×1, ×1.000, ×1.000.000 and ×1.000.000.000 multipliers. With the unit %, a value in the range 1 to 100 is interpreted as a percentage of the link speed.

Flushing TX packets

Remaining packets in TX segments can be canceled from their queues and not sent, so that a stream can be closed down faster, by setting the CancelTxOnCloseMask parameter in the ntservice.ini file (see DN-0449). This is useful when replaying low-rate traffic.

Transmission of PCAP files

The SmartNICs have native support for transmission of PCAP files. Configuration can be done using the NT_NetTxOpen_v2 and NT_NetFileOpen_v2 functions (see DN-0449).