Jeff Dean (Google Senior Fellow) – Exciting Directions for ML Models and the Implications for Computing Hardware (Sep 2023)


Chapters

00:03:24 Implications of Machine Learning Model Trends for Computer Hardware
00:09:36 Machine Learning Models: Sparsity, Adaptivity, and Dynamic Neural Networks
00:13:35 Trends and Considerations for Computer Architects and System Builders in Machine Learning
00:20:52 Redefining Computing Infrastructure for Machine Learning
00:24:34 Accelerated Computing: Overcoming Challenges and Achieving the Next 100x
00:27:47 Designing Computing Systems for Efficiency and Sustainability
00:33:35 Optimizing Machine Learning Performance and Energy Efficiency
00:44:25 Challenges in Scaling Stochastic Gradient Descent
00:47:54 Accelerating Chip Design with Machine Learning
00:55:11 Innovation in Chip Design and Deployment for Machine Learning

Abstract

Revolutionizing Computing: The Future of Machine Learning and System Design

Machine learning (ML) is transforming the landscape of computing, necessitating radical changes in hardware design and system architecture. This article delves into the latest trends and challenges in ML, such as the shift towards more dynamic, sparse, and efficient models, the implications for computer architects, and the urgent need for scalable, sustainable systems. It highlights the evolving metrics for machine learning workloads, including power efficiency, carbon dioxide emissions, and reliability. The role of accelerated chip design and system optimization in adapting to these rapid advancements is also explored, emphasizing the importance of a holistic approach to system design that prioritizes throughput, sustainability, and adaptability.



Exciting Directions for ML Models:

Machine learning models have led to significant advancements in computers’ capabilities in areas like image classification, speech recognition, and language understanding. Recent developments have enabled the generation of images, text, and speech from textual descriptions, which broadens their application spectrum. The emergence of conversational systems that can generate interactive discussions and Python code is a testament to their educational utility. There’s a promising trend in fine-tuning general models for specific tasks, such as medical exams, indicating a surge in domain-specific applications. The rise of multimodal models capable of handling diverse inputs and outputs is another notable advancement.

Implications for Computer Architects and ML Hardware Design:

There’s a growing focus on delivering significant increases in compute capacity and efficiency to keep pace with ML advancements. This necessitates the design of ML hardware that can accommodate the rapidly evolving models. The fast-paced development in ML thus demands a swift deployment of corresponding hardware solutions.

Sparsity in Machine Learning Models:

Sparse models, characterized by adaptively activated pathways, offer higher efficiency, greater capacity, and improved accuracy. Understanding the difference between coarse-grained and fine-grained sparsity is crucial. Modern hardware’s support for sparsity is now a vital factor for computer architects to consider.

Adaptive Computation and Dynamically Changing Neural Networks:

Adaptive computation allows for varying computational costs, allocating more resources to more complex examples, thus enhancing efficiency. The emerging capability of neural networks to continuously adapt their structure and parameters represents a significant development in the field.

Pathways System:

The Pathways system is designed for dynamic resource management, facilitating hardware addition or removal during runtime and managing communication across multiple network transports. This system is a crucial development in handling dynamic computing needs.

Model Trends:

The trend in machine learning is shifting towards single models capable of handling multiple tasks. There’s a move towards sparse models for increased efficiency and an emphasis on handling various inputs and outputs within a single model.

Key Takeaways for Architects:

For architects, the focus should be on the importance of connectivity, bandwidth, and latency in accelerators. The significance of scale for effective training and inference, the influence of sparse models on memory capacity and routing, and the need for user-friendly ML software for complex models are critical. Power efficiency, sustainability, and reliability are essential considerations in this context.

CO2 Emissions in Machine Learning Training:

Previous overestimations of CO2 emissions in machine learning training have been corrected with more accurate data. Advancements in transformer models have played a role in reducing carbon emissions.

System Design for General AI:

There is a pressing need to rethink system design comprehensively, with a focus on overall system throughput rather than component-level optimization. This approach prioritizes system-level optimization.

Exponential Growth in Model Parameters:

The number of dense parameter models has been growing at an exponential rate, increasing the computation costs associated with these models.

Conventional Wisdom Thrown Out the Window:

The traditional belief that general-purpose compute is sufficient for all computations is no longer tenable in the current landscape of machine learning.

TPUs and Specialized Hardware Innovations:

The emergence of TPUs and other specialized hardware architectures to meet the unique demands of machine learning represents a significant shift. Innovations in TPU technology include high-bandwidth synchronous interconnects for connectivity between computing units, liquid cooling for enhanced power efficiency, specialized data representations for optimized performance, optical circuit switching for reliable large-scale computer connectivity, hardware tailored for efficient scatter-gather operations and dense matrix multiplications, and high-bandwidth memory stacked on compute units for low latency and high bandwidth.



The future of ML demands systems that are scalable, efficient, and holistic in design, with a focus on power, sustainability, and reliability.

Shifting the Focus of Computing Metrics from Headline Performance to System Performance per Watt and Carbon Dioxide Emissions:

The traditional metric of performance per chip is becoming obsolete, with factors such as reliability, power consumption, and carbon dioxide emissions becoming increasingly important. The metric of system performance per average watt encourages efficient utilization of available power capacity, and it’s crucial to consider the entire system, not just the peak performance of individual components. Similarly, the metric of system performance per carbon dioxide emissions emphasizes the environmental impact of data centers, urging minimization of carbon dioxide emissions throughout the lifecycle of the infrastructure.

Optimizing Power Consumption:

Improving throughput while reducing power consumption involves understanding the power characteristics of jobs and making appropriate adjustments. System-level scheduling optimizations can lead to significant efficiency improvements. A cell-wide control plane can manage power consumption and optimize scheduling, spreading high-power jobs across multiple busbars to avoid overloading.

Reliability and Silent Data Corruption:

As machine learning workloads run synchronously across thousands of compute nodes, reliability becomes a critical concern. Silent data corruption poses a growing challenge that affects both CPUs and accelerated compute chips.

Benchmarking and Design Targets:

In designing computing systems for ML, power, reliability, and carbon dioxide equivalents are key benchmarks to consider.

Rise in Computing Demand:

There has been a staggering 10x annual growth in model parameters, leading to a corresponding increase in computing demands.

Radical Shift from General-Purpose Compute:

The demands of ML are driving a fundamental shift in computing systems, distinctly different from traditional general-purpose computers.

Accelerated Computing Progress:

Advancements in accelerated computing have surpassed those in general-purpose compute, enabling various breakthroughs in the field.

Computational Needs and Future Challenges:

The increasing complexity of models and data demands computational capabilities that extend beyond current accelerated computing solutions.

System Optimization Focus:

The focus is now on optimizing for system throughput, power, reliability, and carbon footprint, a shift from previous approaches.

Current Metrics Limitations:

Traditional metrics, such as chip performance alone, are insufficient for a comprehensive evaluation of computing systems.

Headline Numbers vs. System Performance:

Common metrics often fail to represent the actual performance of computations across complex systems.

MLPerf Benchmark Limitations:

The MLPerf benchmark, which focuses on performance at a given system size, tends to neglect other critical factors like system costs, emissions, efficiency, and power consumption.

System Performance over TCO:

Google assesses system designs based on performance over the total cost of ownership, including both capital and operating expenditures.

Perf TCO Assumptions and Limitations:

The assumptions of Perf TCO regarding data center capacity and power attribution are being reevaluated to better align with current needs.

Evolving Considerations:

Google is evolving its considerations to address the limitations of Perf TCO, reflecting a shift in evaluation criteria for computing systems.

TPU Architecture Evolution:

Over the past eight years, Google’s TPU architecture has evolved significantly, adapting to increasingly complex computational problems.

Perf-TCO: An Evolving Metric for Machine Learning Workloads:

Traditional performance metrics are proving insufficient for evaluating ML workloads, leading to the development of new metrics that account for reliability, power consumption, and carbon dioxide emissions.

Systems Perf per Average Watt:

There is an increased emphasis on system performance per average watt, encouraging better power utilization in computing systems.

Systems per Carbon Dioxide Emissions:

The focus is also on system performance relative to carbon dioxide emissions, highlighting the need for environmentally sustainable computing solutions.

Exploring Silent Data Corruption in Compute Elements:

Silent data corruption (SDC) is a growing challenge in large-scale computing, leading to incorrect results and potentially corrupting entire computations. Monitoring gradient norms can indicate SDC, but differentiating it from normal behavior is challenging. Rapid checkpointing and restart mechanisms are utilized to mitigate the impact of SDC.

Machine Learning in Hardware Design:

Machine learning is increasingly being used to accelerate and improve the design of specialized hardware. This includes architectural exploration, synthesis, verification, placement optimization, and customization of hardware design parameters and compilers for specific models. The development of ML-based tools and methodologies is revolutionizing hardware design.

Opportunities for ML in Chip Design and Manufacturing:

Machine learning capabilities are rapidly advancing, enabling fundamental changes in the computing community. ML models are becoming increasingly dynamic with evolving structures. The focus should be on system throughput rather than chip headline performance. Metrics such as power, CO2e efficiency, and SDCs are crucial to measure and improve.

Shorter Timelines for Chip Design and Deployment:

Shorter timelines are essential to adapt quickly to the changing ML landscape. ML automation can help streamline the design process.

Challenges in ML Automation for EDA:

Challenges in ML automation for electronic design automation (EDA) include data availability, physics-aware modeling, and the involvement of human experts.


Notes by: ZeusZettabyte