John Hennessy (Alphabet Chairman) – The End of the Road for General Purpose Processors & the Future of Computing (Jul 2018)
Chapters
00:00:08 The End of Moore's Law and Dennard Scaling: A New Era in
Overview: The seminar is dedicated to honoring John Linville’s legacy in the field of electronic systems technology. The speakers discuss John Linville’s contributions to Stanford University and his impact on the field of electrical engineering. The inaugural lecture by John Hennessy explores the massive changes in computing technology and the challenges posed by the slowdown of Moore’s Law and Dennard scaling.
Electronic Systems Technology: Electronic systems are crucial to the information society, shaping our lives in various ways. The demand for energy-efficient electronic systems is increasing due to the continuous growth of information technology. Electronic systems technology is undergoing major shifts, driven by technological advancements and societal needs.
John Linville’s Legacy: John Linville, a faculty member in Stanford’s Electrical Engineering Department, was a pioneer in transistor circuit design. He joined Stanford in 1954 and established a graduate program in transistor circuit design, shaping the department’s future. Linville played a key role in creating three laboratories at Stanford: A lab for PhD students to build with semiconductor devices. A lab where students could build integrated circuits. The Center for Integrated Systems, a collaborative laboratory engaging faculty from EE and CS departments.
John Hennessy’s Inaugural Lecture: John Hennessy, former President of Stanford University, delivered the inaugural lecture. He discussed the massive changes in computing technology, including the evolution of microprocessors, instruction-level parallelism, multi-core processors, and increased clock rates. Hennessy emphasized the challenges posed by the slowdown of Moore’s Law and the end of Dennard scaling. The end of Dennard scaling has created a crisis, as more transistors on a chip lead to increased power consumption. The slowdown of Moore’s Law also affects efficiency, as transistors that are not efficiently used take up space, consume power, and increase costs.
00:13:57 Limits of Semiconductor Scaling and Changing Computer Architectures
Architectural Shifts: Changes in the dominant computing landscape emphasize mobile and cloud-based platforms over traditional desktops.
Slowdown of Moore’s Law: Processor performance improvements have significantly slowed down in recent years, from 52% per year to 3.5%. Energy efficiency has become a critical metric due to the rising demand for portable devices and cloud computing.
DRAM and Moore’s Law: DRAMs have experienced a dramatic slowdown in growth, with no announced follow-on spec for DDR5. The aspect ratio of DRAMs has reached extreme levels, making further progress challenging.
End of Dennard Scaling: The end of Dennard scaling has led to increasing power consumption and reduced energy efficiency in processors. This has resulted in the need for architectures that are more efficient in terms of power.
Limitations of Traditional Architectural Techniques: Traditional architectural techniques for improving performance, such as instruction-level parallelism and multi-core designs, have reached their limits. Further pushing these techniques yields diminishing returns, especially in terms of energy efficiency.
Instruction-Level Parallelism: Pipelining and multiple issue techniques have been used to increase instruction throughput. However, these techniques have become energy-inefficient due to the large number of instructions in execution and the complexity of the hardware.
Energy Crisis in Processors: The slowdown in Moore’s Law and the end of Dennard scaling have created a crisis for processors. Traditional architectural techniques are no longer sufficient to address the need for energy efficiency. New approaches are needed to design architectures that are more efficient in terms of power.
00:25:50 Speculation and Its Consequences in Modern Processors
The Challenges of Branch Prediction and Speculation: Modern processors can execute 120 to 140 instructions in flight, but branches make it difficult to achieve high accuracy in branch prediction. Speculating on branch outcomes can lead to wasted work and energy if the prediction is incorrect. The cost of cleaning and restarting the pipeline after a misprediction is significant.
The Limits of Instructional-Level Parallelism: Although theoretical parallelism exists, it is challenging to achieve in practice due to the difficulty of accurately predicting branch outcomes. The pursuit of instructional-level parallelism reached a limit where further improvements were difficult to achieve.
The Shift to Multi-Core Processors: To overcome the limitations of instructional-level parallelism, researchers and architects turned to multi-core processors. Multi-core processors run separate threads designated by the programmer, allowing for true parallel execution of tasks. The programmer is responsible for identifying and managing the parallelism in the code.
Conclusion: The end of the pursuit of instructional-level parallelism marked a significant shift in the design of computer processors. The transition to multi-core processors enabled a new era of parallelism, where programmers are responsible for exploiting parallelism in their code to achieve performance gains.
00:30:40 Amdahl's Law and the Limits of Multi-Core Scaling
Amdahl’s Law: Amdahl’s Law, proposed by Gene Amdahl in the 1970s, states that the speedup of a program running on a parallel computer is limited by the portion of the program that can only be executed sequentially.
Multi-Core Processors: Modern multi-core processors consist of multiple separate processors (cores) with their own caches, interconnected by a network, memory control, and I/O. Each core can be designed independently, allowing for scaling by adding more cores.
Amdahl’s Law Effect on Multi-Core Processors: The Amdahl’s Law effect is significant in multi-core processors. The speedup is limited by the fraction of the code that can only run on a single core. For example, if 10% of the code can only run on one core, the maximum speedup is limited to 10x, even if there are 64 cores.
Overcoming Amdahl’s Law: Overcoming Amdahl’s Law in a general-purpose computing environment is challenging. Attempts to solve the problem for large workloads often encounter new instances of Amdahl’s Law. Coordination and synchronization between processes can create Amdahl’s Law bottlenecks.
Power Consumption and Amdahl’s Law: Cores waiting for a single core to complete a sequential portion of the code consume power while idle. Shutting down idle cores is not a viable solution due to the time required to restart them.
Conclusion: Amdahl’s Law poses a significant challenge in scaling multi-core processors. The need to coordinate and synchronize processes exacerbates the problem. The power consumption of idle cores further complicates the situation.
00:35:23 The End of Dennard Scaling: Dark Silicon and Packaging Challenges
Dark Silicon: The end of Dennard scaling, which governs the scaling of transistors, also marks the end of multicore scaling as it has been traditionally done. This leads to the phenomenon of “dark silicon,” where cores are turned off to save energy and reduce heat generation.
Core Turn-off: Turning off a core takes millions of instruction cycles to turn back on, making it crucial to carefully consider when to do so.
Power Consumption and Packaging: Current large multicore chips already have clock rates significantly lower than smaller chips due to heat dissipation constraints. Projecting future chip designs to smaller process nodes, such as 11 nanometers, would result in even higher power consumption and the need to turn off even more cores. Packaging technology improvements have been relatively slow, limiting the amount of power that can be dissipated by a chip.
Active Core Count: Even with aggressive assumptions about packaging improvements, only a fraction of the total cores on a future chip could be active at any given time due to power constraints.
00:37:40 Constrained Computing: Challenges and Potential Solutions
Power and Heat Limitations: The number of active cores in a processor is limited by power consumption and heat dissipation. At 180 watts, only 60 cores can be active, and at 200 watts, 65 cores can be active. Liquid cooling may be necessary to remove heat effectively and allow for more active cores.
Impact of Power and Ambell’s Law: The combination of power limitations and Ambell’s Law (which states that the power consumption of a processor increases with the number of cores) results in a grim outlook for multicore scaling. With 96 processors and only 1% sequential code, the speedup is limited to 38, less than half the processor count, leading to an efficiency below 50%.
“Dark Multicore” and “Dark Silicon”: The term “dark multicore” or “dark silicon” refers to the challenge of utilizing all the cores in a processor due to power and heat constraints. There is currently no straightforward solution to overcome this problem.
Challenges for General Purpose Processors: The failure of Dennard scaling means that any inefficiency in processor design translates into real problems in terms of performance improvement. The current approach of increasing processor speed by burning transistors is not sustainable. There is no obvious path forward for general-purpose processors due to these challenges.
Alternative Approaches: There is a draft paper from MIT titled “There’s Lots of Room at the Top” that suggests returning to efficiency in software rather than focusing solely on productivity. Rewriting code in a more efficient language like C can lead to significant performance improvements. This approach involves giving up some of the efficiency gained in software productivity in exchange for better performance.
Another Potential Route: The speaker mentions another potential route that involves a different way of thinking about the problem, but the details are not discussed in the provided text.
00:41:29 Domain-Specific Architectures for Enhanced Performance
Advantages of Domain-Specific Architectures: Improved performance and efficiency due to better parallelism, memory bandwidth utilization, and elimination of unnecessary accuracy. Tailored to specific domains, enabling customization for a family of related tasks. Programmable, unlike ASICs, allowing for flexibility and adaptability to changing requirements.
Challenges of Domain-Specific Architectures: Maintaining a niche advantage over general-purpose architectures, as history shows that special-purpose machines often struggle to sustain their lead. Developing domain-specific programming models that enable software to align with the hardware’s capabilities. Creating a diverse range of architectures, potentially leading to increased complexity and fragmentation in the computing landscape.
Key Principles for Effective Domain-Specific Architectures: Employing SIMD (single instruction, multiple data) parallelism for increased efficiency and simplicity. Utilizing software analysis to determine parallelism, reducing the burden on hardware. Optimizing memory usage through user-controlled memories and eliminating caches when appropriate. Reducing unnecessary precision by using smaller data units and relaxed accuracy requirements.
Importance of a Domain-Specific Programming Model: Matching software requirements with hardware capabilities is crucial for achieving performance gains. Historical examples, such as the ILLIAC-IV, highlight the need for a close relationship between software and hardware design.
Implications for Future Architectures: Architects need to think differently about performance optimization, considering the algorithms and structures of specific domains. The focus should shift from low-level software interfaces to understanding and leveraging the structure of programs. This approach could lead to a proliferation of specialized architectures, posing challenges for system design and integration.
00:50:11 Architecture Innovations for Future Computing
Specialized Architectures for Different Applications: Specialized architectures optimized for specific applications, such as machine learning and deep neural networks related to driving, are becoming increasingly important. Examples of specialized architectures include: Giant machine in the cloud for general-purpose deep neural network tasks like speech recognition, image recognition, and medical diagnosis. Phones with processors designed for speech recognition. Virtual reality headsets with processors optimized for virtual and augmented reality applications.
Collaboration Between Algorithm Designers and Hardware/Software Experts: To effectively utilize specialized architectures, collaboration between algorithm designers, application experts, software developers, and hardware engineers is essential.
Design Cost Considerations: Designing multiple specialized architectures can be costly. Efforts should be made to reduce the design costs of these architectures.
Rethinking Hardware-Software Interfaces: Rethinking the interfaces between hardware and software can help bridge the gap between specialized architectures and traditional silicon-based computing.
Continued Innovation in Silicon-Based Computing: Ongoing innovation in silicon-based computing is necessary to maintain the benefits of Moore’s Law and ensure a smooth transition to specialized architectures.
Conclusion: The future of computing involves specialized architectures optimized for specific applications, collaboration between algorithm designers and hardware/software experts, and continued innovation in silicon-based computing.
Abstract
Navigating the Post-Moore Era: Rethinking Electronic Systems and Computer Architecture
In an era marked by the waning influence of Moore’s Law and Dennard scaling, the inaugural lecture of John Hennessy at the Distinguished Seminar on Electronic Systems Technology has brought into sharp focus the urgent need for a paradigm shift in computer architecture and electronic system design. Hennessy’s insights, reflecting on John Linville’s pioneering legacy at Stanford and the evolving landscape of computing, highlight the critical challenges and opportunities ahead. From the constraints of instruction-level parallelism and Amdahl’s Law to the emerging potential of Domain-Specific Architectures (DSAs) and dark silicon issues, the lecture encapsulates a fundamental transformation in how we conceive, design, and utilize electronic systems in a rapidly changing technological landscape.
—
Introduction of the Distinguished Seminar on Electronic Systems Technology:
The seminar, organized to honor the legacy of John Linville, emphasized the importance of energy-efficient systems and acknowledged Linville’s lasting impact on Stanford and engineering. These remarks provided historical context and highlighted current challenges faced by the field, setting the stage for Hennessy’s lecture.
John Linville’s Legacy at Stanford:
John Linville, a visionary recruited by Fred Terman in 1954, was instrumental in developing Stanford’s program on transistor applications. His establishment of three new laboratories and the Center for Integrated Systems (CIS) underscored the interdisciplinary nature of modern electronic systems, blending electrical engineering and computer science.
John Hennessy’s Inaugural Lecture:
Hennessy’s lecture served as a crucial turning point, drawing attention to the massive changes in computing driven by technology and architecture. He nostalgically referred to the ‘golden age of computing,’ while soberly acknowledging the slowdown of Moore’s Law and Dennard scaling, signaling a crisis in power consumption and efficiency.
The End of Moore’s Law and Dennard Scaling:
The slowing pace of Moore’s Law and the invalidation of Dennard scaling present formidable challenges in designing efficient electronic systems. This transition demands a fresh approach to computing, focusing on efficiency, architecture, and interdisciplinary collaboration.
The Changing Landscape of Computer Architecture:
The field of computer architecture is undergoing a profound transformation. Architectural limits, a shifting application landscape, and the priority of energy efficiency are reshaping how we approach modern architectures, especially in the context of mobile devices and cloud-based data centers.
The Slowdown of Moore’s Law and Its Implications:
DRAM challenges and the impact on transistor counts and Dennard scaling have brought energy efficiency to the forefront. This slowdown necessitates a fundamental reevaluation of processor designs.
Instruction-Level Parallelism and Its Limits:
The diminishing returns of ILP techniques like pipelining and multiple issue, in terms of energy efficiency, signal the need for new architectural approaches that go beyond traditional techniques.
Branch Prediction and Instruction-Level Parallelism Challenges:
Branch prediction, a key feature of modern processors, illustrates the inherent difficulties and inefficiencies in maximizing ILP. The increasing complexity and overhead of managing incorrect predictions demonstrate the practical limits of this approach.
Shift to Multi-Core Era and Amdahl’s Law:
The transition to multi-core processors, driven by the limits of ILP and the challenges of branch prediction, underscores the pivotal role of Amdahl’s Law. This law highlights the diminishing returns of parallel processing, especially when a fraction of the program remains sequential.
Dark Silicon and Power Consumption:
The concept of ‘dark silicon,’ arising from the end of Dennard scaling, illustrates the challenges in multicore scaling and power management. The limitations of packaging technology and the consequent power and efficiency limits have significant implications for the design and utilization of multicore systems.
Amdahl’s Law and Its Impact on Multi-Core Processors:
Amdahl’s Law states that the speedup of a program running on a parallel computer is limited by the portion of the program that can only be executed sequentially. In multi-core processors, this effect is significant as the speedup is limited by the fraction of the code that can only run on a single core. Overcoming Amdahl’s Law in a general-purpose computing environment is challenging, and attempts to solve the problem often encounter new instances of the law. Coordinating and synchronizing processes can also create Amdahl’s Law bottlenecks.
The End of Dennard Scaling and the Rise of Dark Silicon:
The end of Dennard scaling marks the end of multicore scaling as it has been traditionally done, leading to the phenomenon of “dark silicon,” where cores are turned off to save energy and reduce heat generation. Turning off a core takes a significant amount of time to turn back on, making it crucial to carefully consider when to do so. Power consumption and heat dissipation are major challenges in multicore scaling, and even with aggressive assumptions about packaging improvements, only a fraction of the total cores on a future chip could be active at any given time.
Challenges in Multicore Scaling and Potential Solutions:
Power consumption and heat dissipation limit the number of active cores in a processor, with liquid cooling being a potential solution to remove heat effectively and allow for more active cores. The combination of power limitations and Ambell’s Law results in a grim outlook for multicore scaling. Alternative approaches, such as returning to efficiency in software and rewriting code in a more efficient language like C, are potential routes for addressing these challenges.
Domain-Specific Architectures (DSAs) and Their Advantages:
Domain-Specific Architectures (DSAs) offer a solution to the limitations of general-purpose architectures by optimizing hardware and software for specific domains. Advantages of DSAs include improved performance and efficiency due to better parallelism, memory bandwidth utilization, and elimination of unnecessary accuracy. They can be tailored to specific domains, enabling customization for a family of related tasks. DSAs are programmable, unlike ASICs, allowing for flexibility and adaptability to changing requirements.
Challenges of Domain-Specific Architectures:
Challenges in developing DSAs include maintaining a niche advantage over general-purpose architectures, developing domain-specific programming models that enable software to align with the hardware’s capabilities, and creating a diverse range of architectures, potentially leading to increased complexity and fragmentation in the computing landscape.
Key Principles for Effective Domain-Specific Architectures:
Effective DSAs employ SIMD (single instruction, multiple data) parallelism for increased efficiency and simplicity, utilize software analysis to determine parallelism, optimize memory usage through user-controlled memories and eliminating caches when appropriate, and reduce unnecessary precision by using smaller data units and relaxed accuracy requirements.
Importance of a Domain-Specific Programming Model:
A domain-specific programming model is crucial for matching software requirements with hardware capabilities and achieving performance gains. Historical examples, such as the ILLIAC-IV, highlight the need for a close relationship between software and hardware design.
Implications for Future Architectures:
Future architectures will require architects to think differently about performance optimization, considering the algorithms and structures of specific domains. The focus should shift from low-level software interfaces to understanding and leveraging the structure of programs. This approach could lead to a proliferation of specialized architectures, posing challenges for system design and integration.
Specialized Architectures for Different Applications:
Specialized architectures optimized for specific applications, such as machine learning and deep neural networks related to driving, are becoming increasingly important. Examples include giant machines in the cloud for general-purpose deep neural network tasks, phones with processors designed for speech recognition, and virtual reality headsets with processors optimized for virtual and augmented reality applications.
Collaboration Between Algorithm Designers and Hardware/Software Experts:
To effectively utilize specialized architectures, collaboration between algorithm designers, application experts, software developers, and hardware engineers is essential.
Design Cost Considerations:
Designing multiple specialized architectures can be costly. Efforts should be made to reduce the design costs of these architectures.
Rethinking Hardware-Software Interfaces:
Rethinking the interfaces between hardware and software can help bridge the gap between specialized architectures and traditional silicon-based computing.
Continued Innovation in Silicon-Based Computing:
Ongoing innovation in silicon-based computing is necessary to maintain the benefits of Moore’s Law and ensure a smooth transition to specialized architectures.
Hennessy’s lecture at the Distinguished Seminar on Electronic Systems Technology, reflecting on John Linville’s legacy and the current challenges in electronic systems and computer architecture, marks a critical juncture. The industry stands at a crossroads, with the slowdown of Moore’s Law and the rise of DSAs and dark silicon shaping the future of computing. As we navigate this post-Moore era, a rethinking of hardware-software paradigms, a focus on energy efficiency, and interdisciplinary collaboration emerge as key pathways forward.
Computer architectures have evolved from RISC to domain-specific, quantum, and AI-centric designs, emphasizing energy efficiency and specialized solutions for specific applications. The future of computing lies in exploring new architectures and technologies to overcome performance and energy limitations....
The slowdown of Moore's Law and Dennard Scaling has necessitated a reevaluation of computer architecture and design principles, with a shift towards energy efficiency and domain-specific architectures. The future of computing involves exploring post-silicon technologies, developing domain-specific architectures, and focusing on cybersecurity and emerging memory technologies....
Danny Hillis' Connection Machine revolutionized computing by introducing massive parallelism, challenging conventional wisdom and opening avenues for future advancements. His work demonstrated that many problems thought to be inherently sequential could be solved in parallel, transforming high-volume technology into high-performance solutions....
Deep learning has revolutionized various domains, but faces challenges due to Moore's law limitations and the complexity of training deep learning models. Domain-Specific Architectures (DSAs) offer superior efficiency in power consumption and transistor utilization for specific application domains like deep learning, computer graphics, and virtual reality....
Parallelism in machine learning reduces communication overhead and training time, and TensorFlow provides robust mechanisms for different parallelism types. Model parallelism and TensorFlow's capabilities enable efficient computation and diverse applications across fields like image search, speech recognition, and medical imaging....
The end of traditional scaling demands innovative computing approaches, such as specialized architectures and energy-efficient designs, while the rise of machine learning and data-centric applications accelerates the need for domain-specific solutions....
TensorFlow, an open-source machine learning library, has revolutionized research in speech and image recognition thanks to its scalability, flexibility, and real-world applicability. The framework's distributed systems approach and data parallelism techniques enable faster training and execution of complex machine learning models....