Transmission features
The transmission functionality encompasses these features:
- Full line rate host-based transmission
- Up to 128 host buffers/streams supported per SmartNIC
- Traffic merge from TX buffers done in the FPGA
TX architecture overview
This figure illustrates a setup with a TX application.
The application writes frames to network streams, which map to TX host buffers. Data is transferred to the SmartNIC via PCIe using DMA for transmission on the specified port.
Transmission scenarios
Frames can be transmitted in different ways:
- On a single port from a single TX host buffer
- On a single port from multiple TX host buffers
- On multiple ports from a single TX host buffer
- On multiple ports on a single SmartNIC from a single TX host buffer depending on the txPort field in the standard packet descriptor of the individual frames (dynamic transmission – see Dynamic transmission configuration)
The transmission mode can be configured per stream.
Dynamic transmission configuration
By default one TX host buffer is allocated per port that is set in the portMask attribute in a call to one of the NT_NetTXOpen functions, and the txPort field in the standard packet descriptor has no effect.
In order to set up dynamic transmission where the TX port for each individual frame is determined by the txPort field, the NT_NETTX_OPEN_FLAGS_ADAPTER_MULTI_PORT_BUFFER flag must be set before the NT_NetTxOpen_Attr() function is called. The flag is selected in the flags attribute in a call to the NT_NetTxOpenAttrSetFlags() function. This allocates a TX host buffer per SmartNIC, and this host buffer can transmit to all ports on the SmartNIC that are specified in the portMask attribute in the NetTxOpenAttrSetPortMask() function.
Typical TX application process
This is a high-level overview of a typical TX application process:
- The application opens a TX stream using the NT_NetTxOpen function.
- The application obtains a buffer handle using the NT_NetTxGet function.
- The application writes packet data to the buffer as required.
- The application releases the buffer to the stream for transmission using the NT_NetTxRelease function.
- The application repeats Step 2 to Step 5 as required.
- When the transmission is complete, the application closes the stream using the NT_NetTxClose function.
Transmission performance
Frames can be transmitted at the speed supported by the ports, the PCIe interface and the transmission pipe, that is typically 105 Gbit/s for NT200A02 running at 2 × 100 Gbit/s and NT200A01 running on the 2 × 100 Gbit/s capture/replay image, 100 Gbit/s for NT100E3-1-PTP running at 1 × 100 Gbit/s, 100/40 Gbit/s for NT200A02 and NT100A01 running at 4 × 25/10 Gbit/s, 80 Gbit/s for NT200A02, NT200A01 and NT80E3-2-PTP running at 2 × 40 Gbit/s or 8 × 10 Gbit/s, 50/20 Gbit/s for NT200A02, NT200A01 and NT50B01 running at 2 × 25/10 Gbit/s, 40 Gbit/s for Intel® PAC A10 GX running at 1 × 40 Gbit/s or 4 × 10 Gbit/s, 40/4 Gbit/s for NT100A01 and NT40A01 running at 4 × 10/1 Gbit/s and NT40E3-4-PTP running on the 4 × 10/1 Gbit/s capture/replay image or the 4 × 10/1 Gbit/s capture/replay 2 image, 20/2 Gbit/s for NT200A02 running at 2 × 10/1 Gbit/s and NT20E3-2-PTP running at 2 × 10/1 Gbit/s, and 4 Gbit/s for NT40A01 running at 4 × 1 Gbit/s.
- The actual user application
- The number of host buffers mapped to the individual ports
- The PCIe bus utilization level and hence the PCIe bus latency
- General server performance
Transmission configuration
It can be configured on which port a frame is to be transmitted, if it is to be transmitted at all, and when it is transmitted. The transmit on time stamp feature (see Transmit on Time Stamp) allows captured traffic to be transmitted, offset in time but with the same relative distance between the frames as when captured.
For each TX port, the transmission rate can be limited to a specific value or a percentage of link speed.
Ethernet CRC
On NT200A02 SmartNICs running on a test and measurement image it can be specified per packet using control bits in the dynamic packet descriptors if the frames are to be transmitted with a new correct CRC, a new incorrect CRC (calculated and then changed to be incorrect) or the stored CRC left untouched. Other SmartNICs always generate a new Ethernet CRC for frames to be transmitted.
Layer 3 and layer 4 checksums
NT200A02 SmartNICs running on a test and measurement image can generate layer 3 (IPv4) and layer 4 (UDP and TCP) checksums for frames to be transmitted. The checksums are controlled by control bits in the dynamic packet descriptors. This is illustrated in the net/checksum code example (see DN-0449).
It can be specified on a per-packet basis if each of the checksums is to be left untouched, to be calculated correctly or to be incorrect (calculated and then changed to be incorrect). UDP checksums can also be specified to be set to zero.
In order for the checksums to be calculated, the packet meta data must contain the layer 3 and 4 offsets. These can be supplied by the FPGA for packets being received by the FPGA, or they must be provided by the applications for packets that have been generated (or modified) by the application. For tunneled packets the application can control which checksums are calculated by the hardware by providing the correct layer 3 and 4 offsets for either the inner or the outer packet. The hardware can only calculate one set of checksums so the application must calculate the other set. As an example the application calculates the outer IPv4 checksum (only header-based) and sets the outer UDP checksum to zero, and the hardware calculates the inner checksums.
Injection of time stamps
A time stamp can be injected into a frame to be transmitted at a specified offset on NT200A02 SmartNICs running on a test and measurement image.
The injected data consists of the 64-bit time stamp prepended a 2-byte correction value to make the complete 10-byte checksum neutral. The original 10 bytes of frame data must be set to all zeros by the application before the injection, and they must belong to the same layer (fully contained in one checksum domain). The time stamp injection offset must be 2-byte aligned with the start of the data included in the checksum to be recalculated.
The time stamp injection can be controlled per packet by a control bit in the dynamic packet descriptors. This is illustrated in the net/timestamp_inject code example (see DN-0449). Time stamp injection can also be applied globally by using TimestampInjectAlways in the ntservice.ini file (see DN-0449).
The value of the time stamp can be the time when the first byte of the frame is put on the wire, or the time when the last byte is put on the wire, depending on the setting of TimestampMethod in the ntservice.ini file (see DN-0449).
The time stamp injection offset is determined by the setting of TimestampInjectDynamicOffset and TimestampInjectStaticOffset in the ntservice.ini file (see DN-0449). TimestampInjectDynamicOffset can be set to the start of the frame, the end of the frame, the start of the layer 3 header or the start of the layer 4 header. The actual offset is calculated from the location of TimestampInjectDynamicOffset by adding the value of TimestampInjectStaticOffset. If, for instance, TimestampInjectDynamicOffset is set to the end of the frame and TimestampInjectStaticOffset is set to -20, the time stamp is injected 20 bytes before the end of the frame. The combination of the two offsets must result in a location inside the current frame, and there must be enough space for both the prepended 2-byte correction value, the 8-byte time stamp and a 4-byte FCS, that is the resulting offset must be between the start of the frame and 14 bytes before the end of the frame. Otherwise some of the time stamp is overwritten by the FCS, and the checksum will be wrong.
Transmit rate limiting
Using NTAPI, the transmit rate can be limited per port using the NT_ConfigWrite function with the NT_CONFIG_PARM_PORT_SETTINGS_V2 parameter and the NtPortSettings_v2_s structure. Set NtPortSettings_v2_s.txPortRateLimit to the desired limit in bit/s (see DN-0449). A value in the range 1 to 100 is interpreted as a percentage of the link speed.
Using the config tool, apply the --tx_port_rate_limit option with a numeric value (see DN-0449). Without a unit, the value is interpreted as bit/s. A unit can be appended to the value. The units b, K, M and G work as ×1, ×1.000, ×1.000.000 and ×1.000.000.000 multipliers. With the unit %, a value in the range 1 to 100 is interpreted as a percentage of the link speed.
Flushing TX packets
Remaining packets in TX segments can be canceled from their queues and not sent, so that a stream can be closed down faster, by setting the CancelTxOnCloseMask parameter in the ntservice.ini file (see DN-0449). This is useful when replaying low-rate traffic.
Transmission of PCAP files
The SmartNICs have native support for transmission of PCAP files. Configuration can be done using the NT_NetTxOpen_v2 and NT_NetFileOpen_v2functions (see DN-0449).