Andy Bechtolsheim (Arista Co-Founder) – Keynote (Jan 2020)
Chapters
Abstract
The Evolution of Cloud Data Centers: Disaggregated Storage and High-Speed Connectivity
—
Introduction
The landscape of cloud data centers is undergoing a transformative evolution, driven by the exponential growth in cloud computing and the surging demand for high-speed connectivity. Billions of devices now rely on cloud-hosted services, necessitating massive data center expansions. This article delves into the pivotal advancements shaping this domain, including the rise of disaggregated storage, rapid Ethernet speed transitions, and the challenges and solutions in network and storage technologies. By adopting an inverted pyramid writing style, we prioritize the most significant developments and their implications for the future of cloud data centers.
The Rise of Disaggregated Storage in Cloud Data Centers
Key Developments and Challenges
The intra-data center traffic surge, driven by flash and distributed storage systems, has spurred unprecedented growth in server-to-server traffic. To support intense data demands, cloud data centers are transitioning to speeds of up to 400 gigabits per second. These data centers utilize leaf-spine network architectures, ensuring uniform server-to-server distances, critical for maintaining consistent performance. Furthermore, data centers are interlinked within metropolitan areas to ensure redundancy and scalability, forming scalable and redundant clusters.
Ethernet speed transitions have also been rapid, transitioning from 40 gigabit to 100 gigabit ports, driven by cloud adoption and the need for cost-effective technologies. Advancements in silicon technology, adhering to Moore’s Law, continue to influence network silicon development, although it lags behind CPUs and GPUs. The dramatic decrease in both the cost and size of network switches has enhanced data center efficiency.
Implications for Storage and Connectivity
Disaggregated storage, separating data storage from applications, enhances reliability and scalability. Protocols like RDMA over Converged Ethernet (RoCE) and NVM Express over TCP/IP (NVMe-oTCP) play crucial roles in this regard. Current technologies enable storage servers to saturate 100 gigabit links, while maintaining network utilization below 50% is vital to avoid latency issues. The adoption of 100 gigabit and beyond, including 200 gigabit NICs, is predominantly driven by economic factors, balancing costs with performance enhancements.
Silicon Photonics Optical Interface Evolution:
The optical interface landscape is undergoing a significant transition. Majority of current optics standards are based on 25 gig NRZ technology, limiting SFP, QSFP, and OSFP modules to 25 gigabit speeds. The industry is moving towards 100 gig single laser optics (100 gig DR), offering cost advantages over 50 gig optics. This will lead to the transition of SFP, QSFP, and OSFP modules from 25 gig to 100 gig speeds.
Additionally, 400 gig optics (DR4, FR4, LR4) and 800 gig optics (SR8, DR8, dual FR4, FR8) will be based on 100 gig lambda technology. However, moving beyond 800 gig poses challenges due to the alignment of the minimum packet size (666 picoseconds) with the internal clock rate of chips. Achieving 1,600 gigabit Ethernet would require two parallel pipes, resulting in scalability issues at the silicon level.
The Future of Cloud Data Center Networking
Emerging Technologies and Standards
Emerging technologies like PCIe 4.0 and 5.0 significantly enhance I/O speeds for servers, crucial for AI and conventional data processing. Innovations like the Tomahawk 3 chip have revolutionized bandwidth capacities, allowing for more efficient data center configurations. The adoption of 100 gigabit and 400 gigabit optics is crucial for matching the increasing speed demands in data centers.
PCIe Generation Evolution:
PCIe 3.0 has been the standard for nearly a decade, offering 8 gigabits per lane. PCIe 4.0 doubles the speed to 16 gigabits per lane, enabling faster I.O., network interfaces, and support for dedicated GPUs. PCIe 5.0 further doubles the speed to 32 gigabits per lane, providing even more I.O. bandwidth for conventional servers and AI GPU servers.
Network Interconnect Efficiency:
Top-of-rack switches have evolved significantly, with the latest Tomahawk 3 chip offering four times the bandwidth of its predecessor. This enables interconnecting a large number of servers at very high speeds and low costs using copper cables.
Disaggregating Data Placement for Scalability:
Disaggregating data placement from the application is crucial in the cloud to ensure scalability and flexibility. The spine tier, or top tier of data center networks, provides the total throughput and can reach multiple petabits in large data centers. Servers can connect to any other server in a few microseconds through these networks, enabling efficient data access.
Storage Protocols and Performance:
RDMA over Converged Ethernet (RoCE) has been widely adopted for disaggregated storage, with two versions: priority flow control (PFC) and early congestion notification (ECN). PFC ensures no packet drops for optimal performance but doesn’t scale well in larger networks. ECN is a newer version that avoids the need for PFC and is still under evaluation for scalability.
NVMe over TCPIP is a newer standard that offers advantages in reliability and ease of deployment. It leverages the reliable transport layer of TCP to guarantee packet order and retransmission. Implementations are underway, with hardware optimizations being explored for performance improvements.
Performance Considerations:
Storage servers can saturate 100 gig links, and conventional servers can saturate at least 50 gig interfaces. Network utilization should be kept below 50% to avoid latency and congestion issues. Simulations using NS3 have shown that increasing network interface speeds improves overall performance.
Challenges and Considerations
With the trend towards fewer CPU sockets, the relevance of NUMA-aware networking is diminishing, but it still plays a role in maximizing memory bandwidth. Emerging interfaces like CXL and Gen Z could impact networking architectures, particularly for applications benefiting from large shared memory environments. Studies conducted by Ariel Handel and Pallavi Shripali at Facebook highlight the importance of simulations in optimizing network performance.
Future of Server Networking:
Disaggregated Storage:
Disaggregated storage using Rocky style or NVMe Express over TCP is now functional in large cloud networks. It requires careful tuning to achieve optimal performance. Easy-to-use disaggregated storage is expected to become widely available in the near future.
Two-Layer Networks:
Using a two-layer network with core and edge is problematic. Leaf spine architectures with larger switches can support a significant number of servers. Additional layers introduce more problems, and down speed conversion can cause network issues. Overprovisioning the network is recommended to minimize issues.
Performance Considerations:
The network speed is limited by TCP throughput and the capabilities of smart NICs. RDMA can offer higher speeds than TCP. Multiple NICs are often used to boost performance. Dual 25 gig NICs are common, providing 25-50 gig speeds. Servers typically have ample M2 flash memory sticks for high IO capacity. The network stack must be optimized to provide the required performance.
CXL and Gen Z Impact:
CXL and Gen Z interfaces may impact networking architecture. Applications that benefit from large CXL memory environments may emerge. Persistent memory and large memory configurations may enable new possibilities. However, this is unlikely to change the current cloud computing landscape.
Speed and Performance Improvements:
Moving from 100 gig to 200 gig NICs can improve performance. Quantifying the improvement depends on various factors and workload profiles. NUMA-aware networking within servers is influenced by changing CPU trends. Future chips may have multiple sockets in a single device for better interconnectivity.
Economic Trade-offs:
Network bandwidth comes at a cost and must be balanced with other components. Cloud providers aim to find the optimal spending ratio between storage, servers, and network. The goal is to prevent network bottlenecks or overspending that doesn’t benefit server performance. Currently, the sweet spot is around 25-50 gig per server, with 3:1 oversubscription at the ToR. As CPU cores and flash speeds increase, the network speed is adjusted accordingly.
Conclusion
The evolution of cloud data centers is marked by a relentless pursuit of higher speed, efficiency, and scalability. Disaggregated storage, rapid advancements in network technology, and the adoption of high-speed connectivity standards are reshaping how data centers operate and evolve. As the industry continues to innovate, challenges related to cost, scalability, and technological integration remain at the forefront. However, with ongoing research and development, the future of cloud data centers looks poised to meet the ever-growing demands of the digital world.
Notes by: WisdomWave