A DOCSIS Dataplane Comparison Across Intel's Processor Generations
Cable operators have long relied on a monolithic Cable Modem Termination System (CMTS) to supply their customers with high-speed data services. However, customer bandwidth demands are continually increasing without showing signs of slowing down. In many ways, these demands are outpacing the limits of what traditional CMTSs offer. Many believe the right approach to solving this problem is a transition to a virtualized architecture.
Acquiring traditional purpose-built hardware that takes up large amounts of space and power could be a thing of the past. Instead, virtualizing the CMTS processing components could be done by utilizing common x86 COTS (Commercial-off-the-Shelf) servers, all while increasing adaptability, scalability, performance and power optimization.
At World Wide Technology, the Advanced Technology Center (ATC) recently demonstrated the packet processing capabilities of Intel's Ice Lake CPUs within a vCMTS by the release of our DOCSIS Dataplane Performance Lab. The lab utilizes v21.10 of Intel's vCMTS Reference Dataplane platform along with two of SuperMicro's SYS-220HE-FTNR Hyper-E Edge servers to measure the dataplane throughput capabilities of Xeon-based processors, as the platform mimics the traffic flow one would typically see in a vCMTS environment, in the first of its kind on-demand lab. Intel's vCMTS Reference Dataplane was designed to assist cable operators in gauging the performance and power efficiencies of Xeon-based CPUs and to highlight the potential of a virtualized architecture.
The release of this lab provided the perfect opportunity to explore and compare two separate vCMTS Dataplane environments to better understand generational differences between two of Intel's CPU microarchitectures code-named Ice Lake and Sky-Lake.
Results of this comparison indicate that Intel's line of 3rd Generation Xeon® SP (Ice Lake) chips demonstrate superior packet processing capabilities than that of its predecessor and increasing capacity to support twice as many customer service groups. With the extra attentiveness to increase the processing of cryptographic functions within Ice Lake CPUs, Intel's new line of chips is well prepared to not only handle but excel in processing dataplane packets.
CPU comparison
Intel has made tremendous improvements to their 3rd Generation of Xeon® SP processors by adding features that increase the efficiency of processing demanding workloads compared to previous generations. Along with the addition of improved cache size, core density and memory channels, some of the most notable changes made to this latest microarchitecture were the enhancements to cryptographic processing. With the help of Intel's Crypto Acceleration package, this new line of processors has been equipped with built-in Vector AES-NI, Vector CLMUL, Intel Secure Hash Algorithm Extensions, VPMADD52 instructions and RSA/DH encryption protocols. These new instructions are used to effectively eliminate the impact of processing full data encryption workloads, offering up to 1.48x faster encryption processing than previous generations. These changes are significant because DOCSIS dataplane packet processing is filled with encryption and CRC generation, which are known to consume a large portion of CPU cycles.
How Intel's vCMTS Reference Dataplane works
Version 21.10 of Intel's vCMTS Reference Dataplane platform runs within a container-based environment entirely orchestrated by Kubernetes between two nodes, the Packet Generator/Controller (PKT-GEN) and the vCMTS Dataplane (vCMTSD). The Kubernetes controller reserves CPU cores and orchestrates both nodes to deploy small computing units known as Pods. These Pods are used to host containerized instances, which in our case simulates DOCSIS traffic. The remaining available CPU cores are allocated to serve telemetry, OS and scheduling functions.
Kubernetes Pods created by the vCMTSD node are used to represent individual service groups and contain instances of the upstream (US) and downstream (DS) dataplane pipelines. Each CPU core on the vCMTSD node is assigned a Pod that is responsible for processing all subscriber traffic for its respective service group. An equal number of Pods are created on the PKT-GEN node, containing DPDK Pktgen-based instances. These instances are responsible for generating iMix2 type traffic to corresponding vCMTSD instances to process. iMix (or Internet MIX) refers to typical traffic that traverses through networking equipment and resembles traffic that would be seen in real-world conditions. Performance metrics are then collected by containers on the vCMTSD node that run the Collectd daemon which stores the data in a Prometheus time-series database. This data is displayed to a Grafana dashboard using Prometheus-based queries.
Competing environments
Sky-Lake dataplane environment
Both vCMTSD and PKT-GEN Servers are equipped with:
- Intel S2600WFS Server Boards
- 2x Intel Xeon Platinum 8710 SP CPUs
- 2x Intel E810-CQDA2 100G
- 192 GB of RAM
- 2x 800GB SSD
The Sky-Lake environment utilizes two of Intel's S2600WFS Server Boards. Each server contains 2x Intel Xeon® Platinum 8710 Scalable Processors, 192 GB of RAM, and 2x Intel E810-CQDA2 100G network adapters connected directly into one another using four 100G links. The P-State settings on the vCMTSD server have also been configured to pin all cores on the CPU to the maximum base frequency of 2.1 GHz to ensure core utilization is fully realized.
Ice Lake dataplane environment
Both vCMTSD and PKT-GEN Servers are equipped with:
- SuperMicro SYS-220HE-FTNR Hyper-E Edge
- 2x Intel Xeon Platinum 8368 SP CPUs
- 2x Intel E810-2CQDA2 200G
- 256 GB of RAM
- 2x 1TB SSD
The Ice Lake environment uses two of SuperMicro's SYS-220HE-FTNR Hyper-E Edge servers. Each server is equipped with x2 Intel Xeon Platinum 8368 scalable processors, 256 GB of RAM, and 2x Intel E810-2CQDA2 200G network adapters connected directly to one another using four 100G links. The P-State settings on the vCMTSD server have also been configured to pin all CPU cores to the maximum frequency of 2.4 GHz to ensure full CPU utilization per core.
What factors influence performance output?
CPU
Within these virtualized environments, CPU cores are individually used to serve specific tasks, and by doing so, eliminate unused computing power. The greater the number of available CPU cores, the more service group pods that can be created and subscribers that can be serviced within a single machine, demonstrating opportunities for scalability and flexibility while remaining to keep cost at a minimum. Furthermore, the features included in the particular SKU of CPU such as I/O, memory capacity, AVX-512 capabilities, and more that aim to optimize workloads, will drastically affect performance as well.
Network adapter
The throughput capabilities of the connected network adapter will obviously play a large role in determining the amount of data the system can process at a given time. Not only that, but the number of E810 controllers on the installed network adaptors determines the number of Virtual Functions (VF) that can be associated per Physical Function (PF). VFs are PCIe functions that process I/O and map to a physical device allowing virtual machines, or in our case service group pods, to communicate with other virtual/physical devices. The Reference Dataplane environment assigns two VFs to each vCMTSD service group pod, which are then used to support both US and DS traffic flow. The more VFs a card can support, the more opportunities for service group expansion.
For the Sky-Lake environment, the dual-ports on each of the attached E810-CQDA2 network adapters support a combined throughput of 100G per card. These network cards support PCIe 4.0, however, because the installed CPU only supports PCIe 3.0, only 48 lanes can be used. Each E810-CQDA2 card contains a single E810 controller, which allows each connection on the two attached dual-port adaptors to support the flow of traffic for four service group pods a piece, assigning eight VF per port.
The installed E810-2CQDA2 adapters within the Ice Lake environment, however, can take advantage of the increased bandwidth provided by the PCIe 4.0 compatible Ice Lake CPUs. To access the 200G speeds this card offers, the PCIe lanes have been bifurcated, separating the occupied x16 lanes into two separate x8 lanes. This allows each port to act as an independent physical adaptor and deliver 100G speeds per port, enabling higher amounts of throughput. Due to the additional E810 controller found on the E810-2CQDA2, each connection on the two equipped dual-port network adapters supports the flow of traffic for eight service group pods a piece, assigning 16 VF per port, effectively doubling the number of VF per physical function (PF) than what was seen on Sky-Lake's E810-CQDA2.
Results
16 service group comparison
The figure below describes the maximum US, DS, and combined throughput of both environments running 16 service group vCMTSD pods, along with the combined power usage of the CPUs. Each test group is separated by the number of subscribers that are issued per service group.
As we compare both environments side-by-side, the Ice Lake environment shows a dramatic performance increase in throughput than the Sky-Lake counterpart while continuing to save in power usage.
32 service groups with Ice Lake
As mentioned above, our Ice Lake environment can support up to 32 different service groups because of the increased core density of the 3rd gen CPU and the additional 810 controllers on the connected network adapters. Using the default service group settings (see Appendix A) on the Reference Dataplane, we could achieve over 300Gbps in total throughput for 32 service groups.
These results demonstrate that a single 2RU machine could theoretically support a bandwidth of 9.4 Gbps per service group. If we attempt to compare these results to the graph above, the Ice Lake environment more than doubles the possible throughput of the Sky Lake counterpart while still being more power-efficient.
Conclusion
Many cable operators appear to be hesitant about the idea of transitioning to a virtual architecture. These solutions are too complex, unproven and difficult to operate. But here at WWT, we do not believe that is the case as these are common misconceptions. While this technology is still in its infancy, it is quickly progressing as an innovation for software-based solutions and is happening at a dramatically faster rate than that of traditional purpose-built hardware. Here, we have displayed impressive results in performance output that was not possible only a few years ago.
Without a doubt, virtualized solutions are the future of service delivery and provide an exceptional number of benefits, including increased scalability, flexibility, manageability and cost-effectiveness. This article aims to show the possibilities of what a virtualized environment offers and highlight the impact a few generations of hardware improvements can have on performance.