4GA Feature Overview

Link™ Capture Software 12.0 Release Summary

Platform
Intel® PAC
Napatech SmartNIC
Content Type
Release Summary
Capture Software Version
Link™ Capture Software 12.0

In this section

This section gives an overview of the 4GA features in the Link™ Capture Software 12.0 release.

Packet descriptors

These packet descriptors are available:

  • Standard packet descriptor
  • Dynamic packet descriptor 1
  • Dynamic packet descriptor 2
  • Dynamic packet descriptor 3
  • Dynamic packet descriptor 4
  • PCAP packet descriptor
Note: Dynamic packet descriptor 4 does not apply to NT40A01-4×10/1-SLB.

Packet descriptor extension

For frames with a standard packet descriptor, extended packet descriptor 9 is available.

Statistics counters per port

RMON1 and extended RMON1 statistics counters are available for each port.

Statistics counters per color

These statistics counters are available for each color:

  • Frame counters
  • Byte counters

Statistics counters per stream ID

These statistics counters are available for each stream ID:

  • Frame counters for forwarded frames
  • Frame counters for flushed frames
  • Frame counters for dropped frames
  • Byte counters for forwarded frames
  • Byte counters for flushed frames
  • Byte counters for dropped frames

Port merging

Frames from multiple ports on the same SmartNIC can be merged into one stream.

QPI bypass

When two NT100E3-1-PTP SmartNICs are connected and working as an NT200C01 solution, it is possible to transfer data received on one NT100E3-1-PTP SmartNIC via the interconnect cable to the other NT100E3-1-PTP SmartNIC and vice versa. In this way data destined for a specific NUMA node can be delivered over the PCIe bus of the SmartNIC that is local to this NUMA node and thereby the QPI in the host can be bypassed to avoid introducing additional latency.

CPU socket load-balancing

Two NT40A01 SmartNICs running on the 4 × 10/1 Gbit/s SLB image, or two NT200A02 SmartNICs running on the 2 × 40 Gbit/s SLB image or on the 8 × 10 Gbit/s SLB image can be bonded in a master/slave configuration so that all traffic received on the master SmartNIC is replicated to the slave SmartNIC, ensuring local cache access to two NUMA nodes in a dual-CPU-socket server to optimize performance.

Time stamp formats

Received frames are time-stamped according to the internal SmartNIC time either when the first byte is received or when the last byte is received.
Note: Start-of-frame time-stamping does not apply to NT40E3-4-PTP SmartNICs running on the capture/replay image, to NT40A01-4×10/1-SLB SmartNICs nor to the Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA.

These time stamp formats are available:

  • PCAP-ns format, 1 ns (only if PCAP packet descriptor is selected)
  • Native UNIX format, 1 ns (not for NT40A01-4×10/1-SLB)
  • Native UNIX format, 10 ns
  • PCAP-μs format, 1000 ns (only if PCAP packet descriptor is selected)

All received frames use the same time stamp format. Offset compensation is available.

RX data path delay values are available in the API to allow applications to compensate for these delays.

Time stamp synchronization

An Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA can be synchronized to OS time with dynamic drift adjustment.

Two or more NT SmartNICs can be synchronized, one being the master and the other(s) being slaves connected using daisy chain.

When configured as master and connected to another NT SmartNIC, an NT SmartNIC can provide a synchronized set of external time and SmartNIC time every second. The external time is the time of the other SmartNIC. Synchronized sets of external time and SmartNIC time can also be obtained every 20 μs.

When an NT SmartNIC is configured as master and connected to a time device, for instance a GPS, the PPS signal from the time device can trigger the sampling of the SmartNIC time.

The NT SmartNIC time stamp clock rate can be synchronized relative to any external time source with a PPS output at TTL levels. If absolute time information is available, the SmartNIC time clock can be synchronized to the absolute UTC time.

Minor adjustments to the internal NT SmartNIC time can be done using a sliding adjust; and cable delays can be compensated for.

When two or more NT SmartNICs are synchronized with respect to absolute time, block statistics are transferred synchronously from the SmartNICs to the host.

For all time reference sources (with OS, NT-TS, PPS or PTP time synchronization) the same time synchronization statistics are supported for NT SmartNICs:

  • Current offset to master
  • Mean value
  • Minimum and maximum offset to master
  • Peak-to-peak jitter
  • Calculated mean offset to master
  • Calculated standard deviation
  • Time since last reset of statistics calculation

IEEE 1588-2008 PTP v2 clock synchronization

The IEEE 1588-2008 PTP v2 clock synchronization functionality allows the SmartNICs to be synchronized against a PTP grandmaster, using the PTP Ethernet port on the SmartNIC.
Note: IEEE 1588-2008 PTP v2 clock synchronization does not apply to the Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA.

These PTP Ethernet port configurations are supported: Static IP, DHCP and VLAN.

These communication protocols are supported: IPv4/UDP and IEEE802.3 (layer2).

These PTP profiles are supported: PTP Default, Telecom, Power and Enterprise.

The SmartNICs support both end-to-end (delay request-response mechanism) and peer-to-peer (peer delay mechanism) for propagation delay measurements.

SyncE support

NT SmartNICs can utilize SyncE-enabled networks to provide highly stable frequency synchronization.
Note: SyncE does not apply to NT200A02 and NT200A01 SmartNICs nor to the Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA.

Frame classification

Frames are inspected and classified by the frame decoder. These protocols are identified for the different layers:

  • Layer 2: ISL encapsulation, VLAN tags, MPLS encapsulation, VN-Tag classification (not for NT40E3-4-PTP running on the capture/replay image nor NT40A01-4×10/1-SLB)
  • Layer 2: EtherII, Novell_RAW, SNAP, LLC, others
  • Layer 3: IPv4, IPv6, others
  • Layer 4: TCP, UDP, ICMP, SCTP, others
  • Tunnel type: GTPv0-U, GTPv1-C, GTPv2-C, GTPv1-U, GRE_v0 (including NVGRE), GRE_v1, IPinIP, EtherIP, VXLAN, others
  • Inner layer 2: VLAN tags, MPLS encapsulation
  • Inner layer 3: IPv4, IPv6, others
  • Inner layer 4: TCP, UDP, ICMP, GRE_v0, SCTP, others
Frames can also be classified as small, large or jumbo frames.
Note: IPv6 is generally supported to the same level as IPv4. In addition, the frame decoder supports a broad range of IPv6 extension headers, as well as filtering based on IPv6 addresses.

Filtering

Filtering can be based on:

  • Port numbers
  • Pattern compares
  • Protocol information
  • Frame size tests
  • Frame error tests
  • Key matching (not for NT200A02 running on the 2 × 40 Gbit/s FM image nor on the 8 × 10 Gbit/s FM image)
  • IP address matching including address groups and wildcard matches (not for NT200A02 running on the 2 × 40 Gbit/s FM image nor on the 8 × 10 Gbit/s FM image)
  • User-defined key tests that include arbitrary fields

Overlapping filters can be prioritized.

Filtering on tunneled IP traffic is available for GTPv0-U, GRE_v0, IPinIP and EtherIP tunnels.

Packet Coloring

The packet coloring functionality enables tagging of captured frames with a color ID based on the filter logic. The color ID can contain contributions from one filter with the highest priority (color) and/or contributions from a number of filters that the frame matches (color mask). Packet coloring can be used, for instance, in connection with multi-CPU distribution (see Multi-CPU distribution).

Correlation of Packets

Correlation keys can be generated to enable quality of service measurements between multiple points in a network and to accelerate packet deduplication in application software.
Note: Correlation of packets does not apply to NT200A02 running on the 2 × 40 Gbit/s FM image nor on the 8 × 10 Gbit/s FM image, to NT200A01 running at 2 × 100 Gbit/s nor to NT40A01-4×10/1-SLB.

Deduplication

The deduplication functionality enables discarding or retransmission of duplicate or nonduplicate frames. Frames are considered to be duplicate frames if they (or a part of them) have the same correlation key, they are not separated by more than a specified time, and if they belong to the same specified part of the traffic. The deduplication functionality generates per-port statistics over the number of frames being discarded, retransmitted and detected as duplicates.

Protocol offsets and masking settings can be used to determine which parts of the frames are compared.

Note: Deduplication does not apply to NT200A02 running on the 2 × 40 Gbit/s FM image nor on the 8 × 10 Gbit/s FM image, to NT200A01 running at 2 × 100 Gbit/s nor 2 × 25 Gbit/s, to NT100E3-1-PTP, to NT40E3-4-PTP running on the capture/replay image nor to NT40A01 running on the 4 × 10/1 Gbit/s SLB image.

Slicing

Slicing can be both dynamic and fixed, or disabled. These slicing modes are available:

  • Fixed length
  • Fixed length + ISL
  • Fixed length + ISL + ETH + VLAN
  • Fixed length + ISL + ETH + VLAN + MPLS
  • Fixed length + ISL + ETH + VLAN + MPLS + L3
  • Fixed length + ISL + ETH + VLAN + MPLS + L3 + L4
  • Fixed length + ISL + ETH + VLAN + MPLS + L3 + L4 + outer data type
  • Fixed length + ISL + ETH + VLAN + MPLS + L3 + L4 + outer data type + inner L3
  • Fixed length + ISL + ETH + VLAN + MPLS + L3 + L4 + outer data type + inner L3 + inner L4
  • End of frame

The end-of-frame dynamic offset enables bytes to be sliced off from the end of the frame by applying a negative offset. This can be used, for instance, for frame checksum removal.

Hash value generation

Hash values can be generated from many types of hash keys based on packet header information:

  • Last MPLS label
  • All MPLS labels
  • 2-tuple
  • 2-tuple, sorted
  • Last VLAN ID
  • All VLAN IDs
  • 5-tuple
  • 5-tuple, sorted
  • 3-tuple GREv0
  • 3-tuple GREv0, sorted
  • 5-tuple SCTP
  • 5-tuple SCTP, sorted
  • 3-tuple GTPv0
  • 3-tuple GTPv0, sorted
  • 3-tuple GTPv1 or GTPv2
  • 3-tuple GTPv1 or GTPv2, sorted
  • Inner 2-tuple
  • Inner 2-tuple, sorted
  • Inner 5-tuple
  • Inner 5-tuple, sorted
  • IP fragment tuple
  • Round-robin

Hash keys can be selected dynamically for different types of frames.

Source and destination addresses and ports can be swapped in hash calculations. Hash swapping can be based on inner and/or outer IP match lists specifying certain IP addresses.

Multi-CPU distribution (see Multi-CPU distribution) can be controlled using hash key masks. Hash word bits that are masked out are set to 0. In this way certain parts of the input data can be disregarded from the hash calculation, so that frames with hash values that only differ due to, for instance, port numbers can be configured to end up in the same host buffer.

Multi-CPU distribution

Multi-CPU distribution enables the SmartNIC to off-load the CPU load-balancing by distributing the processing of captured frames in the host CPU. Data can be placed in separate buffers based on port numbers, hash values and filtering.

IP fragment handling

The IP fragment handling functionality accelerates the processing of fragmented IP traffic and enables the use of 5-tuple (or other non 2-tuple) hash keys on fragments for better CPU load distribution.
Note: IP fragment handling does not apply to NT200A01 running at 2 × 100 Gbit/s.

Stateful flow management (NEW)

The stateful flow management functionality can recognize received frames belonging to specific flows, and apply the same action to these frames, while statistics about the flow are being updated in a flow record. The frames can, for instance, be forwarded to a specific host Rx queue, dropped or fast forwarded to a network port.

The frames are decoded and lookups are made in a flow table to recognize the frames. The flow table can be updated according to learned flows. Flows can be based on several individual fields from the packets, for instance as 5-tuples, 4-tuples, 3-tuples, 2-tuples or combinations.

Flows can be terminated based on TCP flow terminations, timeouts or application requests.

Stateful flow management can, for instance, be used to offload security applications by processing frames from known flows entirely on the SmartNIC and only forwarding frames that are not recognized to the application.

Note: Stateful flow management only applies to NT200A02 running on the 2 × 40 Gbit/s FM image or on the 8 × 10 Gbit/s FM image.

Local retransmission

The local retransmission functionality enables frames received on one network port to be retransmitted to the same port or to another network port on the same SmartNIC without involving the host CPU. The retransmitted frames can be expanded to include a trailer containing a 64-bit RX time stamp with a resolution of 1 ns.

Note: Local retransmission does not apply to NT200A01 running at 2 × 100 Gbit/s.

Line loopback

Frames received on one network port can be retransmitted to the same network port without involving the host CPU. The line loopback functionality and the filtering/capturing functionality can be used independently of each other.

Host-based transmission

The full host-based transmission functionality enables high-speed transmission with low CPU load of frames located in a host buffer in the server application memory. Frames that have been received by a SmartNIC can be retransmitted by the same or a different SmartNIC without modification.
Note: The full host-based transmission functionality does not apply to NT200A01-2×100 running on the capture image nor to NT40E3-4-PTP running on the capture image.

Transmission can be both static to a single port and dynamic, where the application can assign different TX ports for different frames.

User data, such as VLAN tags can be inserted into the transmitted frames using dynamic descriptor 3.

Transmission can be timed so that frames are transmitted at specific points in time. In this way frames can be transmitted, for instance, according to their RX time stamps, so that they are replayed as captured. Timed transmission also allows synchronized replay of traffic from a number of different SmartNICs when their time stamp clocks are synchronized.

Limited host-based transmission

Limited host-based transmission only applies to NT200A01-2×100 running on the capture image and to NT40E3-4-PTP running on the capture image. This host-based transmission is very CPU-intensive and has a very limited TX rate.

Buffer system

The buffer system supports the use of up to 128 RX buffers per SmartNIC in host memory with dynamic host buffer segment size and up to 64 RX buffers with static host buffer segment size.
Note: Dynamic segment size does not apply to the Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA.