John Hennessy (Alphabet) (May 2019)

John Hennessy (Alphabet Chairman) – Language Consortium Keynote (May 2019)

Chapters

00:01:39 RISC Architecture: A Revolution in Computer Design

00:06:25 Computing Technology Evolution and Its Impact

00:11:35 The End of Moore's Law and the Rise of Energy Efficiency

00:15:49 Limits of Speculation and Instruction-Level Parallelism

00:19:17 Limits of Conventional Multicore Processors

00:22:08 Challenges and Opportunities in Hardware and Software Efficiency

00:26:25 Novel Approaches to Processor Design for Domain-Specific Languages

00:38:06 Rethinking Processor Architectures for Specialized Domains

Compute Demand and Training Time:
Domain-specific architecture (DSA) enables more efficient training of complex models like GANs and reinforcement learning, which require immense computational power. The training time for these models is significantly higher compared to conventional approaches, demanding specialized hardware with massive compute capabilities.

Object Code Compatibility and DSL Interface:
Traditional processors maintain object code compatibility, ensuring backward compatibility with existing software. DSA breaks away from this convention, allowing changes to the underlying architecture as long as the compiler can efficiently compile to it. This flexibility enables rapid development and iteration of domain-specific architectures, such as Google’s Tensor Processing Units (TPUs).

TPUs and Their Advantages:
TPUs are specialized hardware designed for training deep learning models. They offer significant advantages in terms of performance and energy efficiency compared to general-purpose processors. TPUs can be stacked to build large-scale supercomputers with high bandwidth memory and liquid cooling.

Rethinking Architecture and Software Models:
DSA challenges conventional notions of architecture design and requires rethinking the interface between software models and hardware. It opens up new possibilities for optimizing performance and energy efficiency by tailoring hardware specifically to the needs of particular applications. This approach allows for faster prototyping and experimentation with different hardware designs.

The Importance of Tight Integration:
Success in the domain-specific world requires tight integration across multiple levels of the stack, from applications to underlying architecture. It involves understanding application characteristics, compiler optimization techniques, domain-specific languages, and the appropriate hardware architecture. By achieving this integration, it is possible to overcome the limits of Dennard scaling and Moore’s law and deliver remarkable performance for high-performance applications.

Dave Cook’s Observation:
Dave Cook, a software architect for the ILLIAC-IV supercomputer, made a poignant observation in 1975. He noted that despite building advanced hardware, they struggled to effectively utilize it, highlighting the need for closer integration between hardware and software.

Conclusion:
Domain-specific architecture offers a promising approach to address the challenges posed by the limits of Dennard scaling and Moore’s law. It enables the development of specialized hardware tailored to specific applications, delivering remarkable performance and energy efficiency. Tight integration across multiple levels of the stack is crucial for successful implementation of DSA.

00:43:58 Emerging Techniques in Domain-Specific Architectures

00:48:22 Networking for Artificial Intelligence and Beyond

00:51:03 Machine Learning Challenges and Opportunities

00:54:56 Architectural Innovation in Computing

Abstract

Exploring the Evolution of Computing: From RISC Innovation to Domain-Specific Architectures, Quantum Computing, and Artificial Intelligence

“Transforming Computing: From RISC Revolution to Quantum Computing, Domain-Specific Architectures, and Artificial Intelligence”

—

In the rapidly evolving landscape of computing, the journey from the pioneering days of Reduced Instruction Set Computing (RISC) to the contemporary focus on Domain-Specific Architectures (DSAs), quantum computing, and artificial intelligence (AI) represents a monumental shift. Central figures like John Hennessy and David Patterson laid the groundwork with RISC, influencing today’s processors, DSPs, GPUs, and TPUs. Their work, culminating in the Turing Award, revolutionized computer architecture. Concurrently, challenges like the slowdown of Moore’s Law and the end of Dennard Scaling have propelled the shift towards energy efficiency and specialized architectures. This article explores the trajectory of these developments, highlighting the critical transition from general-purpose computing to a future dominated by DSAs, quantum computing, and the ever-growing importance of energy efficiency and approximation techniques in processing.

—

John Hennessy’s Leadership and Contributions to RISC Architecture

John Hennessy’s contributions to the computing field are profound and span leadership, research, and education. As president of Stanford University, he oversaw the expansion of the engineering quad and established the Knight-Hennessy Scholars Program, the largest fully endowed graduate level scholarship program in the world. His commitment to leadership extends to his book, Leading Matters, exploring the nature and teachability of leadership.

Hennessy’s influence in computer architecture began with his founding of MIPS in the early 1980s, pioneering the RISC architecture. RISC simplified the instruction set and turned the CPU into a simple pipeline for faster processing. This inspired subsequent processors and domain-specific processors like DSPs, GPUs, and TPUs. Hennessy’s role in RISC led to the development of the PISA architecture used by Tofino. His two bestselling international textbooks on computer architecture, co-authored with Dave Patterson, have been widely used for over 30 years. Together, they received the Turing Award in 2017 for their contributions to RISC architecture and their influential textbook.

The Dilemma of Moore’s Law and Dennard Scaling

While Moore’s Law has been a guiding principle in the semiconductor industry for decades, predicting a doubling of transistors every two years, it has been more an aspiration than a law. The end of Dennard Scaling, which predicted constant power per square millimeter as transistors got smaller, has marked a critical turning point. This slowdown, coupled with the shift in application from desktops to mobile and cloud computing, has emphasized the need for energy efficiency. The thermal power limit of processors, leading to reduced clock speeds and core shutdowns, further accentuates the challenge.

Addressing Performance and Energy Efficiency Crises

The slowdown in single-core performance growth and DRAM development has prompted a reevaluation of processor design. Energy efficiency has become paramount, especially in cloud computing, where the capital costs of servers and cooling/power infrastructure are comparable. Thermal power limits further exacerbate the challenge.

The Paradigm Shift to Multicore Processors and DSAs

Instruction-level parallelism (ILP) and Amdahl’s Law have presented challenges in improving performance. ILP’s limits and Amdahl’s Law, which states that parallelization speedup is limited by the non-parallelizable fraction of the program, have driven the industry towards multicore processors. These processors allow parallel execution of multiple threads or programs, improving performance and efficiency. However, this shift brings thermal dissipation and increased energy consumption challenges. RISC’s focus on instruction set efficiency has become crucial for power-sensitive devices, leading to the rise of domain-specific architectures tailored for specific applications like GPUs for graphics and TPUs for deep learning.

The Role of Domain-Specific Architectures

DSAs, exemplified by Google’s TPUs and GPUs, offer substantial performance gains by tailoring architecture to specific applications. These architectures employ approximation techniques like reduced numerical precision, enhancing efficiency in scenarios ranging from deep learning to weather prediction. The integration of DSAs across application, compilation, DSL, and architecture levels is pivotal for overcoming the limits imposed by traditional computing paradigms.

Quantum Computing and Future Trends

Quantum computing emerges as a frontier in the computing landscape, albeit with significant practical difficulties. Maintaining coherent system states and building large, low-error quantum machines pose substantial challenges. Meanwhile, advancements in material science, like carbon nanofiber and 3D stacking, hold promise for improved energy efficiency and performance in traditional computing.

AI and Networking: A Look at the Future

– AI networks may primarily interconnect systems engaged in learning and training processes.

– Current top-of-rack switches can handle Netflix’s peak global data traffic at any given moment. The ratio of data handled for training versus video streaming is significant.

– The discussion shifts to the integration of ML and reduced functions within switches. The question arises whether networking experts or ML specialists will lead this integration or if a merger will occur.

– The technology space is becoming more fragmented, leading to variations in hardware and packaging. Different devices may require specialized ML processors, such as for camera or voice recognition.

– The discussion moves beyond first-generation ML, acknowledging that this field is still in its early stages. The future of AI is uncertain, but it is expected to evolve and potentially move beyond supervised learning.

Challenges and Potential Solutions in Computer Architecture

Thermal Limitations and Instruction Set Efficiency: Thermal dissipation power limits the number of active cores in a processor. Power limitation impacts Amdahl’s law effect, reducing parallel performance. Instruction set efficiency becomes a key driver for power-efficient devices.

The Shortcoming of Modern Programming Languages: Modern programming languages prioritize software productivity over execution efficiency. Python, as an example, can be highly inefficient in execution compared to C.

Hardware-Centric Approach and Domain-Specific Ideas: General-purpose processors face a deadlock in performance improvement. Domain-specific architectures are the only viable solution. Domain-specific languages will be crucial for these architectures.

Efficiency Gains through Optimization: A study by MIT researchers demonstrates significant efficiency improvements in matrix multiplication. Optimizations such as using C, parallel loops, memory blocking, and vector instructions resulted in a 65,000-fold speedup. Potential for substantial performance gains through various techniques.

Tailoring Architectures to Application Needs: Domain-specific architectures aim to achieve performance by closely aligning with application requirements. Examples include GPUs for graphics, network processors, and deep learning accelerators.

Emerging Trends in Computer Architecture

Energy Efficiency in Computing: Research is actively exploring ways to reduce energy consumption in computing, including optimizing memory usage, minimizing control overhead, and utilizing systolic arrays.

Domain-Specific Architectures (DSAs): DSAs have gained traction due to their ability to deliver superior performance and energy efficiency for specific applications. They offer advantages such as simpler parallelism models, improved memory hierarchy usage, and tailored programming models.

Performance and Energy Efficiency of DSAs: DSAs can achieve significantly better performance per watt compared to general-purpose processors. Roofline models illustrate the relationship between arithmetic intensity, memory bandwidth, and arithmetic bandwidth in determining performance.

Demand for DSA Performance: The demand for high-performance computing in domains like deep learning is rapidly growing, driving the need for specialized hardware with massive compute capabilities.

Rethinking Architecture and Software Models: DSA challenges conventional notions of architecture design, requiring a rethinking of the interface between software models and hardware. This opens up new possibilities for optimizing performance and energy efficiency.

Tight Integration for Success: Successful implementation of DSA requires tight integration across multiple levels of the stack, from applications to underlying architecture. This involves understanding application characteristics, compiler optimization techniques, domain-specific languages, and the appropriate hardware architecture.

Conclusion

The computing world is at a pivotal juncture, transitioning from general-purpose processors to specialized architectures and quantum computing. This shift necessitates a rethinking of architectural and interface designs, emphasizing energy efficiency, parallelism, and domain-specific solutions. The future of computing, influenced by lessons from the past and innovations in the present, looks towards a landscape where domain-specific architectures and quantum computing redefine what’s possible in processing power and efficiency.

Notes by: crash_function

John Hennessy (Alphabet Chairman) – Language Consortium Keynote (May 2019)

Chapters

Abstract

Related posts: