Open vSwitch (OVS) is a robust, OpenFlow-based implementation of a Virtual Switch (vSwitch), which is widely adopted in virtualization environments. OVS has continuously evolved with new features to become the most widely adopted vSwitch technology for enabling server-based networking. OVS is the mainstream solution that is used in the market today for network virtualization and more specifically de-multiplexing traffic from the network into and out of applications and Virtualized Networking Functions (VNF) running in Virtual Machines (VM) or containers.
Software-only implementations of OVS fail to meet the performance and scale requirements for widespread deployments of virtualized applications. Open vSwitch exerts significant CPU load simply for directing traffic into and out of VMs or containers. Napatech Link-Virtualization™ Software software provides an open OVS offload solution running on Napatech SmartNICs that not only meets the performance goals for carriers, enterprises, and data center operators, but does so at the price point required in these market segments.
Napatech Link-Virtualization™ Software software solution deployed on Napatech’s family of SmartNICs provides a single software package to offload and accelerate OVS for virtualized environments. The transparent offload architecture replicates the OVS fast path on the SmartNIC which greatly accelerates packet processing. The offload solution includes the FPGA firmware as well as host-side drivers, and user-space software.
The core value proposition of the Napatech Link-Virtualization™ Software solution is hardware-based Open vSwitch offload, with an almost 6x gain in performance compared to user-space OVS based on Data Plane Development Kit libraries (OVS DPDK) and more than 60x gain compared to the traditional OVS (i.e. kernel space packet processing). The performance increase results in significant reduction in CPU cycle consumption and Total Cost of Ownership (TCO). The solution forms a baseline used as the core for customized offload products.
Napatech Link-Virtualization™ Software 4.4 supports the following features:
- Full offload of OVS-DPDK datapath
- OVS 2.15.0
- DPDK 20.11.1
- Fastpath forwarding of traffic between specified vSwitch vPorts
- OVSDB, CLI, or local flow API for configuration
- Support for standard SDN controllers
- OVS stateful statistics
- Offload of 1024 megaflows (wildcarded flows)
- Support for millions of flows into the megaflow structure
- Extensive and configurable match processing for L2, L3, L4 packet headers
- VLAN /VXLAN, Q in Q encapsulation/decapsulation
- Link Aggregation (active / active, active / standby)
- Traffic port mirroring
- 62 datapath VFs/VMs
- Virtio support
- VM live migration
- Jumbo frame support
- Supported on Red Hat Enterprise Linux 8
- 10, 25GbE network interfaces on NT50B01, NT200A02
- Receive Side Scaling (RSS)
- Quality of Service (QoS)
Open vSwitch Ports
The full offload of virtual ports (ports which are directly attached from hardware to a VM) into OVS-DPDK is made by using the concept of a representor port, which acts as a normal DPDK port in OVS but only receives the non-offloaded traffic (i.e. SW fallback processing in the OVS user-space data-path). New flows that are classified and cached in the OVS megaflow cache by the representor port PMD may be offloaded to hardware. The representor port will receive statistics from all offloaded flows to update its OVS megaflow statistics accordingly.
Multi-queue RSS is supported by the Napatech Link-Virtualization™ Software software. RSS is for the VM to be able to scale datapath traffic using multiple CPU cores, but when OVS switch is involved, typically the PMDs used in OVS SW switching also needs to be CPU core scaled. However, when using HW-offload, we do not need OVS to be RSS configured, but only the direct datapath connection from the SmartNIC to VM.
A mirror port may be created in OVS as a part of the OVSDB configuration to forward ingress and/or egress traffic of a certain port to an alternative destination (i.e. the mirror port). Mirror ports can also be fully offloaded in a seamless and automatic way by the Napatech Link-Virtualization™ Software software.
Open vSwitch (OVS) offload integration
The Napatech Link-Virtualization™ Software software supports OVS in the DPDK compiled version. It uses an extended version of the partial offload API in OVS-DPDK to support full offload. This solution allows Open vSwitch megaflows to be transparently offloaded to the Napatech NIC hardware. This means that no special configuration is needed to activate the individual offload functions. It is done seamlessly and transparently.
When offloading, the Open vSwitch MFC (Mega Flow Cache) in the userspace, the datapath is still active in the case where a flow could not be offloaded to hardware. The switching performance in these scenarios will therefore be the same as without offload. This fallback mechanism guarantees that any non-offloaded flow will benefit from the full performance of OVS DPDK.
The solution will try to offload all new OVS megaflows to the Napatech SmartNIC. If successful, all subsequent packets matched by the offloaded megaflows have their actions processed in hardware and are sent to their destination: i.e. VM port VirtIO queue, or physical TX port.
Statistics for all hardware switched packets are counted in hardware and are periodically updated to the Open vSwitch PMD. This means the Open vSwitch statistics will correctly represent both the full-offload and non-offload megaflows at all times.
There are different levels of offload depending on the following scenarios:
- Megaflow match could not be applied in hardware: no offload
- Megaflow match could be offloaded into hardware, but not all actions were supported: partial offload
- Megaflow match and all actions were fully offloaded into hardware: full offload
Tunnel offload supported
The Napatech Link-Virtualization™ Software software supports the following tunneling protocols:
- L2 tunneling: IEEE 802.1Q VLAN and QinQ
- L3 tunneling: VXLAN
These tunnels are automatically offloaded when configured and setup in Open vSwitch. The hardware performs tunnel pops and pushes accordingly.
VXLAN offload mimics the SW processing; the ingress offload first performs a pop-recirculate followed by an inner-header + VNI match, while egress is performed as a push without the need for recirculation. In case of an inner-header + VNI miss, the popped frame + VNI is sent to the host for SW-fallback processing by OVS DPDK.
Open Flow Support
OpenFlow is a native part of Open vSwitch and is thus fully supported in OVS. The current latest version is OpenFlow1.4.
The following match fields supported by Open vSwitch are offloaded into hardware:
- Ethernet source address
- Ethernet destination address
- Ethernet type
- VLAN TPID + TCI
- IPv4 version/IHL
- IPv4 TOS
- IPv4 next protocol
- IPv4 TTL
- IPv4 fragment offset
- IPv4 source address
- IPv4 destination address
- IPv6 protocol
- IPv6 version/traffic class/flow label
- IPv6 hop limit
- IPv6 source address
- IPv6 destination address
- UDP/TCP/SCTP layer source port
- UDP/TCP/SCTP layer destination port
- TCP data offset
- TCP flags
- VXLAN VNI
- Ingress port
OVS actions supported in hardware for full offload
Full offload requires all specified actions of the megaflow to be supported by hardware. The following actions are currently supported for offload:
- Output to 1 or 2 port(s) (i.e. 2 for mirror), physical and/or virtual
- Push VLAN
- Set VLAN vid
- Pop VLAN
- Push tunnel header (VXLAN)
- Pop tunnel header (VXLAN)
- Recirculate with alternate in-port
When an Open vSwitch bridge contains ports participating in different L2 VLAN segments, it is normal to add this information in the OVSDB configuration for each port. In that case these VLAN actions will be automatically applied by Open vSwitch into the final megaflow match-action pair. Additionally, when configuring VXLAN port in an Open vSwitch bridge the normal way, the VXLAN header will be automatically added/stripped in the final megaflow match-actions pairs on all flows flowing through the tunnel. This is then automatically offloaded into hardware using the added VXLAN hardware-offload functionality in Open vSwitch.
Partial Offload – Classification only
Partial OVS offload is a recently accepted addition to the open source OVS-DPDK standard. Partial offload provides the necessary constructs for offloading the packet classification to a compatible SmartNIC.
Partial offload uses a marker scheme to reduce the CPU processing required for packet classification, i.e. the frame decode and megaflow match as shown in the following figure.
The SmartNIC is programmed with a unique identifier called a mark, and a search pattern for each offloaded megaflow. Packets that match a specific search pattern are tagged with its associated mark. The packet and the mark (placed in a packet header via metadata) are delivered to the host. The OVS port PMD on the host can then use the mark to directly look up the OVS actions in its megaflow index table thus saving classification in SW. All actions are performed in OVS SW and the packet is delivered to its OVS egress port by the PMD.
Partial offload can only be used for packets received on a physical port (i.e. north bound traffic). Any packet not matched in the SmartNIC is passed up without a mark and are then classified by the PMD.
The NIC and the megaflow index table are programmed (via the rte_flow library in DPDK) using a tap on the OVS megaflow creation following a call to the slow path (i.e. ovs-switchd).
Full Offload – Classification and Actions
Full offload for OVS DPDK is built on an extension of its upstreamed partial offload functionality. The extension adds new concepts to support both megaflow match and action offload in a compatible SmartNIC as shown in the following figure.
As of DPDK 20.11, a small but growing subset of megaflow actions are supported in the upstreamed version of OVS DPDK.
To be able to utilize full offload, a virtual port in the SmartNIC must be defined in PMD and tied to an OVS port.
If the SmartNIC cannot offload all actions defined in a megaflow, then the OVS representor port PMD will execute the match and actions. If the destination port is fully offloaded the packet is returned to the SmartNIC for direct delivery (i.e. bypass classification and actions) to the destination, otherwise the PMD delivers the packet to the destination in host memory.
Quality of Service
OVS DPDK supports rate limiting (aka policer) quality of service control (QoS) for all vSwitch ports. Napatech OVS DPDK offloaded (i.e. representor) ports support both ingress and egress policing. The traffic from the fully offloaded VM will be policed against the bandwidth thresholds configured for the port. The rate limiting configuration consists of a maximum average bandwidth (ingress_policing_rate) and a maximum burst bandwidth (ingress_policing_burst). When a fully offloaded port is about to exceed one of these limits the Napatech Link-Virtualization™ Software OVS DPDK offload solution will drop packets from this port until new bandwidth credits are available (i.e. in the same way as OVS DPDK software).
Multi-queue RSS ports are policed at the port level (i.e. metering the aggregate of the port queues).
- Active-Passive. This LAG configuration provides redundancy. If the link drops out, the other takes over and continues.
- Active-Active. This LAG configuration provides increased throughput. It receives packets from all ports in the bond and merges them together into the bond queue. On transmit, the packets are sent out on one of the ports in the bond selected by a flow-hash-key (comp with static IEEE 802.3ad balance-tcp), so that a certain flow will always be sent out on the same physical link.