Danny Hillis (Thinking Machines Corporation Co-founder) – UCLA Distinguished Lecturer Series (Nov 1992)
Chapters
00:00:57 Emergent Behavior and the Promise of Massive Parallelism
Introduction: Leonard Kleinrock introduces the lecture by Danny Hillis, a renowned computer scientist known for his work on the connection machine and massive parallelism.
Motivational Factors: Hillis’ childhood experiences with limited building blocks and his father’s medical research in various countries influenced his interest in biology, neurobiology, and independent behavior.
Educational Background: Hillis began as a biology student at MIT but later switched to mathematics, developing an interest in simulating neurons and computers. He worked as a game designer at Milton Bradley and designed chips for video games while pursuing his studies.
Collaboration with Marvin Minsky: Hillis sought to work with Marvin Minsky, a well-known AI researcher. He impressed Minsky by fixing errors in his circuit design and eventually joined his laboratory. He completed his PhD studies in massive parallelism at MIT.
Development of the Connection Machine: Hillis’ research focused on parallel processing and resulted in the concept of the connection machine. He faced challenges in obtaining funding from big companies like IBM and Cray. He eventually secured funding from DARPA and founded Thinking Machines Corporation.
Interaction with Claude Shannon: Hillis shared a common interest in toys with Claude Shannon, a prominent information theorist and juggler. Hillis’ creative and unusual means of transportation during visits to Shannon’s house gained media attention.
Emergent Behavior: Hillis emphasized the importance of understanding emergent behavior, where simple individual elements combine to create complex outcomes. He believed this concept holds great promise for research and has practical applications in various fields.
Conclusion: Hillis expressed his enthusiasm for parallel processing and emphasized the importance of enjoying one’s research work. He prepared to deliver a lecture on massive parallelism and share his insights and findings in the field.
00:11:52 Common Misconceptions about Parallel Processing
Initial Myths about Parallel Processing: Many problems were believed to be fundamentally sequential in nature, like the pointer polling problem. Sequential processing was considered more efficient and easier to program. Parallel processing was thought to be suitable only for highly specialized applications. There was a concern that increasing the number of processors would lead to a proportional increase in communication overhead.
Unveiling the Truths: Many problems initially perceived as sequential were later found to have inherent parallelism. Parallel processing proved to be efficient and effective for a wide range of applications, not just specialized ones. Advances in technology and programming techniques reduced communication overhead, making parallel processing more scalable. The focus shifted from maximizing processor count to optimizing communication and synchronization strategies.
Exploring a Different Mythical Area: Hillis hinted at another area where myths still prevail, suggesting that there is potential for further exploration and discovery.
00:15:42 Common Misconceptions About Parallel Processing
Arguments Against Parallel Processing: Amdahl’s Law: The argument that using multiple processors won’t significantly speed up a problem if a portion of it is inherently sequential. Direct Methods: The belief that numerical problems solved through direct methods, involving algebraic manipulations, were fundamentally sequential. New Programming Languages: The assumption that parallel computers would require entirely new programming languages due to the need for synchronization and preventing conflicts between processors. Programming Difficulty: The belief that programming parallel computers would become increasingly difficult as the number of processors increased. Input/Output Limitations: The argument that input and output operations, particularly disk drives, were limited by mechanical constraints, offsetting any gains from faster processing. SIMD vs. MIMD: The assumption that most problems required multiple processors to perform different tasks, favoring MIMD (multiple instruction, multiple data) architectures over SIMD (single instruction, multiple data) architectures. Hardware Efficiency: The belief that single instruction, multiple data architectures were more hardware-efficient and simpler to design. Fastest Technology: The idea that the fastest computers should be built using the fastest available technology.
00:23:08 Parallel Processing: A Personal Journey Through the Minefield
Early Perceptions of Semiconductor Technology: Non-saturated bipolar technology (ECL transistors) was seen as the fastest but power-hungry. CMOS technology, with its low power consumption, was considered slow and suitable for simple applications like toys and calculators.
Hillis’ Determination to Explore Parallel Processing: Despite prevailing skepticism, Hillis remained driven by a personal interest in solving problems in artificial intelligence. Biological evidence, such as the fast processing capabilities of neurons, provided a compelling reason to believe in the potential of parallel processing.
Arguments Against Parallel Processing: Conventional wisdom suggested that parallel processing was not a viable approach for general-purpose computers. Critics argued that slow components would result in slow performance, making parallel processing impractical.
Hillis’ Initial Approach: He initially focused on developing algorithms expressed in parallel, anticipating the future availability of suitable machines. Early work involved common sense reasoning problems and later expanded to vision-related issues.
Exposure to CMOS Technology: Hillis gained experience with CMOS technology through toy design projects. CMOS, with its ability to integrate tens of thousands of transistors on a chip, presented potential for future applications.
00:26:24 Connection Machines: From Neural Nets to Physics Simulations
Neural Metaphor and the Desire for General Computation: Daniel Hillis was inspired by the brain’s neural network but didn’t take it literally due to uncertainty about their function. He focused on creating a general-purpose processor with flexible communication to perform any computation.
Generalization Challenges: Initial skepticism and focus on solving specific problems, not general-purpose computing. Concerns about efficiency and limitations of single-bit processors for arithmetic operations.
Feynman’s Challenge and the QCD Problem: Richard Feynman, a summer student, challenged the notion of a special-purpose machine. Feynman wrote a program for the Connection Machine to solve quantum chromodynamics (QCD).
Unexpected Performance Discovery: Kleinman’s calculations revealed that the Connection Machine’s parallelism outperformed dedicated machines for the QCD problem. The massive number of processors compensated for the lack of arithmetic units.
Realization of Broader Applicability: This discovery hinted at the machine’s potential for solving general numerical problems, not just neural network processing. It opened up the possibility of addressing a larger market beyond neural networks.
Next Surprises and the Path to General-Purpose Computing: The journey towards a general-purpose Connection Machine continued with further surprises and developments.
00:32:38 Parallel Processing: Uncovering Hidden Potential and Efficiency Gains
Trivializing Parallel Processing: The realization that many problems thought to be inherently sequential could be solved in parallel emerged gradually. The key to unlocking parallelism often lies in shifting the perspective from manipulating data to considering data as an active participant.
Connection Machine’s Architectural Advantage: In a connection machine, each memory location has an associated processor, allowing for distributed processing.
The Pointer Trick: A fundamental technique in parallel processing involves establishing pointers between adjacent processors. Each processor sends its address to the processor at the address it holds, creating a bidirectional linked list. This technique effectively halves the problem size in one step, transforming it into a logarithmic problem.
Widespread Applicability of the Pointer Trick: The pointer trick is a versatile technique applicable to various problems previously considered sequential. It is often built into parallel processing parameters or explicitly implemented in code.
Practical Example: The pointer trick can be used to efficiently compute subtotals of a list of numbers.
The Fallacy of Extensive Code Modification: Contrary to initial assumptions, modifying programming languages for parallel processing turned out to be minimal. Most of the parallelism could be achieved with existing constructs, making the transition smoother than anticipated.
Conclusion: The discovery of these parallel processing techniques challenged conventional wisdom and opened up new possibilities for solving complex problems efficiently.
00:36:06 Data Parallel Processing: A Paradigm for Simple Parallel Programming
Data-Parallel Programming Overview: Data-parallel programming is a technique that allows the expression of parallelism in a program by operating on a whole bunch of data at once. It maintains the sequential model of computation, where one step happens after another, but within each step, a lot of data can be operated on simultaneously. This simplifies programming and avoids issues like synchronization, deadlock, and complex processor coordination.
Key Concept: Iterative Parallelism: In many cases, parallelism is iterated over all the data in a model. A classic example is processing a two-dimensional image with millions of pixels, where the time-consuming task is iterating through and processing each pixel one by one.
Expressing Parallelism in Programming: Data-parallel programming generalizes the programming language to allow the expression of operations on a whole bunch of data at once. This can be done by replacing loops with expressions that operate on entire arrays or data structures. For instance, instead of a loop that assigns values to individual elements of an array, a single expression can be used to assign values to the entire array.
Benefits of Data-Parallel Programming: It simplifies the expression of parallelism in programs. It enables the exploitation of parallelism without having to worry about complex processor coordination and synchronization issues. It allows programmers to focus on the problem at hand without getting bogged down in the details of parallel programming.
Historical Context: Data-parallel programming emerged as a response to the challenges of scaling up the number of processors in parallel computers. It addressed the problems associated with synchronization and deadlock that arise when multiple processors are working on different parts of a program concurrently.
00:39:59 Parallel Processing: Grain Size, Amdahl's Law, and Rambo's
Amdahl’s Law and the Grain Size Argument: Conventional wisdom suggested that increasing the number of processors would lead to diminishing returns due to Amdahl’s law and the grain size argument. Hillis argues that the grain size argument was wrong because, with data parallelism, more processors meant faster computation. Amdahl’s law was also less relevant because the potential parallelism scaled with the amount of data.
Rambo’s Law: Rambo’s law stated that the amount of speedup from parallelization is limited by the sequential portion of the problem. Hillis argues that Rambo’s law was mathematically correct but practically irrelevant. In practice, users typically wanted to solve bigger problems, not just solve the same problem faster.
Scaling Up Parallel Problems: Scaling up a parallel problem by increasing the data size leads to a significant increase in the parallel portion of the problem. The sequential portion remains relatively constant, resulting in a significant overall speedup.
The Exception: The only exception to the general trend was when the sequential portion of the problem was very large. In such cases, parallelization would not provide significant speedup.
Conclusion: Hillis concludes that conventional wisdom about parallel computing was often incorrect. The focus should be on solving bigger problems rather than just solving the same problem faster. Parallelization can provide significant speedup even when the sequential portion of the problem is large.
Scaling Input/Output (I.O.): Initially, I.O. was considered a bottleneck in parallel processing, as it often scaled with the data size. The development of the data vault (disk array) parallelized I.O. by using numerous small disks in parallel, striped across. Error-correcting codes (ECC) were employed to ensure data integrity despite disk crashes. This approach allowed I.O. to scale with the processing part, improving overall system performance.
Technological Considerations: Bipolar technology was initially favored for its speed advantage over CMOS technology. Large disks were deemed more efficient than small disks. However, economic factors played a crucial role in technology development. CMOS technology benefited from investments in video games, personal computers, and calculators. Volume production made CMOS more cost-effective than bipolar technology.
Parallelism and Cost-Effectiveness: By utilizing high-volume, low-cost technologies in parallel, it became possible to achieve high-performance computing. Parallelism allowed for the integration of cost-effective technologies into high-performance applications. This concept transformed high-volume technologies into high-performance solutions.
SIMD (Single Instruction, Multiple Data): SIMD architecture faced skepticism due to its limited applicability. The realization that many problems could be decomposed into independent tasks suitable for SIMD processing changed perceptions. SIMD’s ability to handle massive parallelism efficiently made it a viable approach for a wide range of problems.
00:49:43 The Evolution of SIMD and its Limitations
Background: The transition from the CO2 to the CO2 model involved a shift from a parallel data model to a sequential data parallel model.
Arguments for a Data Parallel Model: The main argument for a data parallel model was to avoid confusion and deadlocks caused by multiple programs running in different processors at the same time. It provided a cleaner conceptual approach to keeping things neat and organized, with moments in time for things to stay together.
Advantages of SIMD Machines: At the time, SIMD machines were simpler to build from a hardware standpoint due to technological limitations. The amount of hardware required for control was trivial compared to the size of the chips, making them more cost-effective.
Challenges of SIMD Machines: The SIMD model over-synchronizes things, causing serialization steps where processors sit idle due to conditional statements. This synchronization becomes more problematic when nesting conditional statements, leading to awkward programming and reduced efficiency.
Improved Model: The data parallel model’s strength lies in its sequential semantics in terms of dependencies, not in every instruction being executed sequentially. Synchronization is only necessary where processors communicate, allowing for parallelism in other parts of the program.
Requirements for a Better Model: A fast way of synchronizing things when needed. A good parallel network for communication between processors. Special hardware support for broadcasting programs to all processors simultaneously.
Conclusion: The shift from the CO2 to the CO2 model highlighted the need for a data parallel model to avoid confusion and deadlocks in parallel programming. However, the limitations of SIMD machines in handling conditional statements led to the realization that synchronization is only necessary at specific points of communication, opening up possibilities for more efficient parallel processing.
00:55:06 The CM5: A More General Massively Parallel Computer
The CM5: An Overview: The CM5, like its predecessor, the CM2, is a massively parallel processing machine, but with significant advancements. It consists of numerous RISC processors, a data network, and a control network, interconnected in a manner resembling a bus structure. The CM5 addresses the limitations of the CM2 by allowing greater flexibility in synchronization and utilizing more powerful RISC processors.
Programming the CM5: The CM5 can be programmed in a data parallel manner, similar to the CM2. Compilers optimize synchronization requirements, only aligning operations when necessary. This approach provides the illusion of a single instruction multiple data (SIMD) machine, while maintaining the efficiency of multiple instructions. Additionally, the CM5 allows for explicit synchronization and message passing, catering to diverse programming styles.
Flexibility and Extensibility: The CM5 offers greater flexibility compared to the CM2, enabling the execution of a wider range of parallel algorithms. Its extensible networks facilitate the integration of additional processors, memory, and input/output devices.
Applications and Versatility: The CM5 has been successfully employed to emulate shared memory machines, demonstrating its versatility in supporting different programming paradigms. As parallel processing matures, the CM5’s hardware architecture adapts to evolving needs and accommodates various programming approaches.
00:58:34 Convergence and Limits of Parallel Machine Architectures
Convergence of Architectures: As computer architectures mature, there is a convergence of different approaches. MIMD (Multiple Instruction, Multiple Data) and SIMD (Single Instruction, Multiple Data) architectures are beginning to incorporate features of each other.
Limits of Parallel Machines: Parallel machines are not useful for all problems. They are particularly effective for problems with large amounts of data generated during computation, allowing for data parallel processing.
Data-Limited Problems: Some problems, such as simulating the motion of 10 planets over time, have limited data but require extensive computation. These problems may not be suitable for parallel processing.
Example of a Data-Limited Problem: Simulating the motion of 10 planets over 100 million years requires solving ordinary differential equations with 10 variables. This problem has limited data and extensive computation, making it challenging to parallelize.
Abstract
Updated Article: Revolutionizing Computing: The Journey of Parallel Processing and Danny Hillis’ Connection Machine
In the world of computing, the evolution of parallel processing marks a pivotal shift, epitomized by the groundbreaking work of Danny Hillis and his Connection Machine. This article delves into Hillis’ journey, from his early inspirations to the development of the Connection Machine, and examines the profound implications of his work on the field of parallel processing. We explore the challenges and myths associated with parallel processing, the evolution of understanding in the field, and the technological advancements that have shaped its current state and future potential.
Early Influences and Collaboration:
Danny Hillis, influenced by his childhood experiences with limited building blocks and his father’s medical research in various countries, developed a keen interest in biology, neurobiology, and independent behavior. His educational background in biology, mathematics, and computer science provided a unique perspective. Hillis’ childhood fascination with juggling and his father’s research sparked his interest in emergent behavior. His collaboration with Marvin Minsky at the AI lab further solidified his focus on parallel processing, leading to the development of the Connection Machine. Marvin Minsky, a well-known AI researcher, recognized Hillis’ talent and welcomed him into his laboratory.
Early Perceptions of Semiconductor Technology:
In the early days of parallel processing, non-saturated bipolar technology (ECL transistors) was seen as the fastest but power-hungry, while CMOS technology was considered slow and suitable for simple applications. Hillis’ determination to explore parallel processing was driven by his personal interest in solving problems in artificial intelligence, supported by biological evidence like the fast processing capabilities of neurons.
Connection Machine and Emergent Behavior:
Hillis’ doctoral research culminated in the concept of the Connection Machine, a testament to his belief in massive parallelism. Despite initial rejections, Hillis persevered, securing funding from DARPA and founding Thinking Machines Corporation. The machine emphasized emergent behavior, demonstrating complex behaviors from simple components, a concept Hillis believed held immense potential across various fields. Hillis’ creative and unusual means of transportation during visits to Claude Shannon’s house, a prominent information theorist and juggler, gained media attention.
Arguments Against Parallel Processing:
Critics argued that slow components would result in slow performance, making parallel processing impractical. Hillis initially focused on developing algorithms expressed in parallel, anticipating the future availability of suitable machines. His work involved common sense reasoning problems and later expanded to vision-related issues.
Overcoming the Myths of Parallel Processing:
The journey of parallel processing involved debunking numerous myths. Contrary to the belief that problems were inherently sequential, Hillis’ work demonstrated that with strategic algorithm design, many problems could be parallelized. This shift in understanding was pivotal in advancing the field, challenging long-standing assumptions like Amdahl’s Law and the sequential nature of direct methods in numerical problems.
Generalization Challenges:
Initial skepticism centered around solving specific problems, not general-purpose computing. Concerns were raised about efficiency and limitations of single-bit processors for arithmetic operations. However, Richard Feynman’s challenge to solve quantum chromodynamics (QCD) revealed the machine’s potential for general numerical problems, opening up a larger market beyond neural networks.
Technological Advances and Hillis’ Motivation:
Hillis’ interest in artificial intelligence and the brain’s neural structure fueled his motivation for massive parallelism. His initial approach, influenced by exposure to CMOS technology, leaned towards using multiple CMOS components for parallel algorithms. Hillis’ vision was for a general-purpose machine, versatile enough to handle various computations. The rise of CMOS technology, initially used in consumer electronics, proved cost-effective and rapidly advanced, further enabling parallel computing.
Feynman’s Challenge and Unexpected Performance:
The Connection Machine’s potential for general numerical problems was highlighted when Richard Feynman, challenging Hillis’ initial notion, proposed using the machine for physics problems like quantum chromodynamics. The machine’s surprising performance in these areas, despite its lack of dedicated arithmetic units, underscored the power of massive parallelism.
Trivializing Parallel Processing:
The realization that many problems thought to be inherently sequential could be solved in parallel emerged gradually. The key lies in considering data as an active participant rather than just manipulating it. The Connection Machine’s architectural advantage allows distributed processing, and techniques like the pointer trick effectively halve the problem size, transforming it into a logarithmic problem.
Implications of Data Parallelism:
Data parallelism emerged as a key aspect, simplifying programming and scaling well with increasing processor counts. Contrary to previous beliefs, more processors in data parallelism actually simplified problems, leading to faster execution. This understanding influenced the evolution of parallel computing, as demonstrated by Rambo’s Law, emphasizing the desire to solve larger problems within the same timeframe.
Solving the I/O Problem and Technology Trends:
The initial I/O bottleneck in parallel processing was addressed through the data vault concept, utilizing parallel arrays of small, cheap disks. Additionally, the rise of CMOS technology, initially used in consumer electronics, proved cost-effective and rapidly advanced, further enabling parallel computing.
Parallelism’s Role and Architectural Advances:
Parallelism transformed high-volume technology into high-performance solutions. Hillis’ exploration of SIMD and MIMD architectures revealed the advantages of each, with SIMD excelling in simple parallel tasks and MIMD offering flexibility for complex algorithms. This led to the development of the CM5 Architecture, which addressed previous limitations and optimized for data parallel programming.
Transition from CO2 to SIMD Model and the Evolution of Parallel Processing:
– The transition from the CO2 to the CO2 model involved a shift from a parallel data model to a sequential data parallel model.
– The main argument for a data parallel model was to avoid confusion and deadlocks caused by multiple programs running in different processors at the same time.
– At the time, SIMD machines were simpler to build from a hardware standpoint due to technological limitations.
– The SIMD model over-synchronizes things, causing serialization steps where processors sit idle due to conditional statements.
– The data parallel model’s strength lies in its sequential semantics in terms of dependencies, not in every instruction being executed sequentially.
CM5: A More Practical and General Parallel Processing Machine:
– The CM5, like its predecessor, the CM2, is a massively parallel processing machine, but with significant advancements.
– It consists of numerous RISC processors, a data network, and a control network, interconnected in a manner resembling a bus structure.
– The CM5 can be programmed in a data parallel manner, similar to the CM2.
– The CM5 offers greater flexibility compared to the CM2, enabling the execution of a wider range of parallel algorithms.
– The CM5 has been successfully employed to emulate shared memory machines, demonstrating its versatility in supporting different programming paradigms.
Maturing of Computer Architectures and Limits of Parallel Machines:
– As computer architectures mature, there is a convergence of different approaches.
– Parallel machines are not useful for all problems.
– Some problems, such as simulating the motion of 10 planets over time, have limited data but require extensive computation.
Danny Hillis’ journey with parallel processing and the development of the Connection Machine revolutionized computing. His work challenged conventional wisdom, introduced new paradigms, and opened avenues for future advancements in computing. Hillis’ legacy is a testament to the power of curiosity, persistence, and innovation in pushing the boundaries of technology.
The Connection Machine series (CM1 to CM5) played a significant role in the evolution of parallel computing, introducing innovative architectural designs, programming models, and capabilities that set a precedent for future developments in the field. The CM series influenced programming models and architectural design in computing, leaving a lasting legacy...
Richard Feynman's involvement in parallel computation led to breakthroughs in computer architecture and problem-solving approaches, while his wide-ranging curiosity and unconventional thinking had a profound impact on various scientific fields....
Hillis' work in proteomics, extracting data from blood, could revolutionize healthcare by shifting focus from treatment to early detection of diseases like colon cancer....
Exploring the nature of time and its influence on art and humanity, Brian Eno and Danny Hillis delve into the significance of long-term thinking, the role of cultural expressions, and the impact of surrender and improvisation on shaping the future....
The slowdown of Moore's Law and Dennard scaling demands a paradigm shift in computer architecture and electronic system design, with energy efficiency and interdisciplinary collaboration key to the future of computing. Domain-Specific Architectures (DSAs) offer advantages in performance and efficiency but pose challenges in programming models and system integration....
Parallelism in machine learning reduces communication overhead and training time, and TensorFlow provides robust mechanisms for different parallelism types. Model parallelism and TensorFlow's capabilities enable efficient computation and diverse applications across fields like image search, speech recognition, and medical imaging....
Long-term thinking is crucial in space exploration and the construction of the 10,000-Year Clock, which reflects our resilience, foresight, and potential to envision a future beyond current horizons. These endeavors encourage creative thinking and inspire us to explore possibilities in timekeeping and beyond....