Feature Set N-ANL10

Huntington Beach 3 Release Summary

Platform
Napatech SmartNIC
Content Type
Release Summary
Capture Software Version
N-ANL10

In this section

This section gives an overview of the features in feature set N-ANL10.

Packet descriptors

These packet descriptors are available:

  • Standard packet descriptor
  • Dynamic packet descriptor 1
  • Dynamic packet descriptor 2
  • Dynamic packet descriptor 3
  • PCAP packet descriptor

Packet descriptor extension

For frames with a standard packet descriptor, extended packet descriptor 9 is available.

Statistics counters per port

RMON1 and extended RMON1 statistics counters are available for each port.

Statistics counters per color

These statistics counters are available for each color:

  • Frame counters
  • Byte counters

Statistics counters per stream ID

These statistics counters are available for each stream ID:

  • Frame counters for forwarded frames
  • Frame counters for flushed frames
  • Frame counters for dropped frames
  • Byte counters for forwarded frames
  • Byte counters for flushed frames
  • Byte counters for dropped frames

Port merging

Frames from multiple ports on the same accelerator can be merged into one stream.

QPI bypass

When two NT100E3-1-PTP accelerators are connected and working as an NT200C01 solution, it is possible to transfer data received on one NT100E3-1-PTP accelerator via the interconnect cable to the other NT100E3-1-PTP accelerator and vice versa. In this way data destined for a specific NUMA node can be delivered over the PCIe bus of the accelerator that is local to this NUMA node and thereby the QPI in the host can be bypassed to avoid introducing additional latency.

CPU socket load-balancing (NEW)

Two NT40A01-4×10/1-SLB accelerators can be bonded in a master/slave configuration so that all traffic received on the master accelerator is replicated to the slave accelerator, ensuring local cache access to two NUMA nodes in a dual-CPU-socket server to optimize performance.

Time stamp formats

Received frames are time-stamped according to the internal accelerator time when the last byte is received.

These time stamp formats are available:

  • PCAP-ns format, 1 ns (only if PCAP packet descriptor is selected)
  • Native UNIX format, 10 ns
  • PCAP-μs format, 1000 ns (only if PCAP packet descriptor is selected)

All received frames use the same time stamp format. Offset compensation is available.

RX data path delay values are available in the API to allow applications to compensate for these delays.

Time stamp synchronization

Two or more accelerators can be synchronized, one being the master and the other(s) being slaves connected using daisy chain.

When configured as master and connected to another NT accelerator, an NT accelerator can provide a synchronized set of external time and accelerator time every second. The external time is the time of the other accelerator. Synchronized sets of external time and accelerator time can also be obtained every 20 μs.

When an accelerator is configured as master and connected to a time device, for instance a GPS, the PPS signal from the time device can trigger the sampling of the accelerator time.

The accelerator time stamp clock rate can be synchronized relative to any external time source with a PPS output at TTL levels. If absolute time information is available, the accelerator time clock can be synchronized to the absolute UTC time.

Minor adjustments to the internal accelerator time can be done using a sliding adjust; and cable delays can be compensated for.

When two or more accelerators are synchronized with respect to absolute time, block statistics are transferred synchronously from the accelerators to the host.

For all time reference sources (with OS, NT-TS, PPS or PTP time synchronization) the same time synchronization statistics are supported:

  • Current offset to master
  • Mean value
  • Minimum and maximum offset to master
  • Peak-to-peak jitter
  • Calculated mean offset to master
  • Calculated standard deviation
  • Time since last reset of statistics calculation

IEEE 1588-2008 PTP v2 clock synchronization

The IEEE 1588-2008 PTP v2 clock synchronization functionality allows the accelerators to be synchronized against a PTP grandmaster, using the PTP Ethernet port on the accelerator.

These PTP Ethernet port configurations are supported: Static IP, DHCP and VLAN.

These communication protocols are supported: IPv4/UDP and IEEE802.3 (layer2).

These PTP profiles are supported: PTP Default and Telecom.

The accelerators support both end-to-end (delay request-response mechanism) and peer-to-peer (peer delay mechanism) for propagation delay measurements.

SyncE support

NT accelerators can utilize SyncE-enabled networks to provide highly stable frequency synchronization.
Note: SyncE does not apply to NT200A01 accelerators.

Frame classification

Frames are inspected and classified by the frame decoder. These protocols are identified for the different layers:

  • Layer 2: ISL encapsulation, VLAN tags, MPLS encapsulation
  • Layer 2: EtherII, Novell_RAW, SNAP, LLC, others
  • Layer 3: IPv4, IPv6, others
  • Layer 4: TCP, UDP, ICMP, SCTP, others
  • Tunnel type: GTPv0-U, GTPv1-C, GTPv2-C, GTPv1-U, GRE_v0 (including NVGRE), GRE_v1, IPinIP, EtherIP, VXLAN, others
  • Inner layer 2: VLAN tags, MPLS encapsulation
  • Inner layer 3: IPv4, IPv6, others
  • Inner layer 4: TCP, UDP, ICMP, GRE_v0, SCTP, others
Frames can also be classified as small, large or jumbo frames.
Note: IPv6 is generally supported to the same level as IPv4. In addition, the frame decoder supports a broad range of IPv6 extension headers, as well as filtering based on IPv6 addresses.

IP fragment handling

The IP fragment handling functionality accelerates the processing of fragmented IP traffic and enables the use of 5-tuple (or other non 2-tuple) hash keys on fragments for better CPU load distribution.
Note: IP fragment handling does not apply to NT200A01-2×100 accelerators nor to NT200A01-2×100/40 running at 2 × 100 Gbit/s.

Hash value generation

Hash values can be generated from many types of hash keys based on packet header information:

  • Last MPLS label
  • All MPLS labels
  • 2-tuple
  • 2-tuple, sorted
  • Last VLAN ID
  • All VLAN IDs
  • 5-tuple
  • 5-tuple, sorted
  • 3-tuple GREv0
  • 3-tuple GREv0, sorted
  • 5-tuple SCTP
  • 5-tuple SCTP, sorted
  • 3-tuple GTPv0
  • 3-tuple GTPv0, sorted
  • 3-tuple GTPv1 or GTPv2
  • 3-tuple GTPv1 or GTPv2, sorted
  • Inner 2-tuple
  • Inner 2-tuple, sorted
  • Inner 5-tuple
  • Inner 5-tuple, sorted
  • IP fragment tuple
  • Round-robin

Hash keys can be selected dynamically for different types of frames.

Source and destination addresses and ports can be swapped in hash calculations. Hash swapping can be based on inner and/or outer IP match lists specifying certain IP addresses.

Multi-CPU distribution (see Multi-CPU distribution) can be controlled using hash key masks. Hash word bits that are masked out are set to 0. In this way certain parts of the input data can be disregarded from the hash calculation, so that frames with hash values that only differ due to, for instance, port numbers can be configured to end up in the same host buffer.

Filtering

Filtering can be based on:

  • Port numbers
  • Pattern compares
  • Protocol information
  • Frame size tests
  • Frame error tests
  • IP address matching including address groups and wildcard matches
  • User-defined key tests that include arbitrary fields

Overlapping filters can be prioritized.

Filtering on tunneled IP traffic is available for GTPv0-U, GRE_v0, IPinIP and EtherIP tunnels.

Packet Coloring

The packet coloring functionality enables tagging of captured frames with a color ID based on the filter logic. The color ID can contain contributions from one filter with the highest priority (color) and/or contributions from a number of filters that the frame matches (color mask). Packet coloring can be used, for instance, in connection with multi-CPU distribution (see Multi-CPU distribution).

Slicing

Slicing can be both dynamic and fixed, or disabled. These slicing modes are available:

  • Fixed length
  • Fixed length + ISL
  • Fixed length + ISL + ETH + VLAN
  • Fixed length + ISL + ETH + VLAN + MPLS
  • Fixed length + ISL + ETH + VLAN + MPLS + L3
  • Fixed length + ISL + ETH + VLAN + MPLS + L3 + L4
  • Fixed length + ISL + ETH + VLAN + MPLS + L3 + L4 + outer data type
  • Fixed length + ISL + ETH + VLAN + MPLS + L3 + L4 + outer data type + inner L3
  • Fixed length + ISL + ETH + VLAN + MPLS + L3 + L4 + outer data type + inner L3 + inner L4
  • End of frame

The end-of-frame dynamic offset enables bytes to be sliced off from the end of the frame by applying a negative offset. This can be used, for instance, for frame checksum removal.

Local retransmission (NEW)

The local retransmission functionality enables frames received on one network port to be retransmitted to the same port or to another network port on the same accelerator without involving the host CPU. The retransmitted frames can be expanded to include a trailer containing a 64-bit RX time stamp with a resolution of 1 ns.

Note: Local retransmission does not apply to NT200A01-2×100 nor to NT200A01-2×100/40 running at 2 × 100 Gbit/s.

Line loopback

Frames received on one network port can be retransmitted to the same network port without involving the host CPU. The line loopback functionality and the filtering/capturing functionality can be used independently of each other.

Host-based transmission

The full host-based transmission functionality enables high-speed transmission with low CPU load of frames located in a host buffer in the server application memory. Frames that have been received by an NT accelerator can be retransmitted by the same or a different accelerator without modification.
Note: The full host-based transmission functionality does not apply to NT200A01-2×100 running on the capture image.

Transmission can be both static to a single port and dynamic, where the application can assign different TX ports for different frames.

User data, such as VLAN tags can be inserted into the transmitted frames using dynamic descriptor 3.

Transmission can be timed so that frames are transmitted at specific points in time. In this way frames can be transmitted, for instance, according to their RX time stamps, so that they are replayed as captured. Timed transmission also allows synchronized replay of traffic from a number of different accelerators when their time stamp clocks are synchronized. (NEW)

Limited host-based transmission

Limited host-based transmission only applies to NT200A01-2×100 running on the capture image. This host-based transmission is very CPU-intensive and has a very limited TX rate.

Multi-CPU distribution

Multi-CPU distribution enables the accelerator to off-load the CPU load-balancing by distributing the processing of captured frames in the host CPU. Data can be placed in separate buffers based on port numbers, hash values and filtering.

Buffer system

The buffer system supports the use of up to 128 RX buffers per accelerator in host memory with dynamic host buffer segment size and up to 64 RX buffers with static host buffer segment size.