Danny Hillis (Thinking Machines Corporation Co-founder) – Architecture of the CM-5 (Aug 2016)


Chapters

00:00:53 Evolution of the Connection Machine Architecture
00:12:06 Data Parallel Programming in the Thinking Machine Era
00:16:42 Evolution of Connection Machine Architecture
00:22:35 Network-Based MIMD Machine Architecture
00:25:06 Designing a Scalable Teraflop Supercomputer
00:33:54 Networks and Adaptive Routing
00:35:58 Fault Tolerance and Clocking in Massively Parallel Machines
00:44:16 Packaging the Connection Machine
00:48:22 The Future of Parallel Computing: Beyond Speed
00:54:14 Decoupling Processor and Network Design for Massively Parallel Processors

Abstract

The Pioneering Journey of Parallel Computing: From CM1 to CM5 and Beyond

Abstract:

This article explores the evolution of parallel computing, focusing on the Connection Machine series (CM1 to CM5). It covers the initial motivations, architectural innovations, programming models, and the eventual transition to a new era of computing. The article adopts an inverted pyramid style, prioritizing significant elements like the CM5’s groundbreaking architecture and paradigm shifts in parallel computing while detailing the technological and design advancements across the series.



Introduction:

The field of parallel computing has undergone a transformative journey, marked by milestones that contributed to its evolution. Central to this story is the series of Connection Machines, starting from CM1 and culminating in CM5. This saga reflects a broader narrative of innovation, challenges, and the relentless pursuit of unprecedented computing performance.



Motivation and Early Iterations:

The genesis of parallel computing can be traced back to Daniel Hillis’ fascination with creating thinking machines. His early work, influenced by Scott Fahlman’s nettle machine, pivoted towards massive parallelism, an approach that would define the Connection Machines’ core philosophy.



The Birth of CM1:

The Connection Machine 1 (CM1) represented a paradigm shift. It introduced a model where a host computer controlled numerous simple processors, each capable of performing parallel arithmetic and Boolean operations. Its innovative communication networks set the stage for subsequent developments in parallel computing.



Programming Model Evolution:

A significant leap in the CM series was the introduction of virtual processors. This innovation abstracted the physical processor limits, allowing programmers to conceptualize a virtually limitless processing landscape. This leap simplified problem-solving and introduced new patterns in parallel programming. However, it had limitations.



Overcoming CM1’s Limitations:

CM1’s groundbreaking approach was hindered by inadequate memory and lack of floating-point operations support, necessitating the development of the next iteration in the series.



The Advancement to CM2:

Addressing CM1’s shortcomings, the Connection Machine 2 (CM2) boasted enhanced memory capacity, integrated floating-point units, and improved parallel I-O capabilities. This hybrid approach marked a significant advancement, widening the scope of applications that could leverage parallel computing.



The Rise of Data Parallel Programming:

The advent of the CM-200 model facilitated a shift towards data parallel programming. This paradigm emphasized algorithmic focus over specific processor assignments and introduced compiler optimizations for parallel operations. Data parallelism, in contrast to control flow parallelism, scaled with data size, offering a new dimension of scalability in computing.



The Convergence of SIMD and MIMD in CM5:

The Connection Machine 5 (CM5) represented a pinnacle in the series, blending the strengths of SIMD and MIMD architectures. CM5’s design, featuring a distributed memory system and a dual-network structure, epitomized the balance between programming ease and execution efficiency.



CM5’s Architectural Brilliance:

The CM5 stood out for its innovative features. Indirect addressing, a design optimized for timesharing, a network acting as a system bus, dynamic data motion, scalable I-O configurations, fault tolerance, and timesharing capabilities set a new benchmark in parallel computing architecture.

– The CM series provided a seamless integration of SIMD and MIMD-like behavior, ensuring programming ease and efficient execution.

– The compiler conducted dynamic synchronization optimization, providing the programmer with a coherent model while exploiting the MIMD machine’s capabilities.

– The CM5’s network structure doubled as a system bus, enabling direct I/O connectivity and flexible data movement patterns, such as striping across multiple disks.

– The expandable network accommodated an arbitrary number of I/O devices, adapting to diverse application needs and ensuring scalability.

– Designed with integrated timesharing capabilities, the CM series facilitated batch processing and interactive use.

– Partitions could be dedicated to specific tasks or utilized for timesharing, maximizing resource utilization and accommodating varying workloads.



Ambitious Goals and Processor Design:

Driven by ambitious performance goals, the CM5’s design focused on achieving teraflop performance, terabytes of memory, and terabits per second I/O. The choice of a SPARC processor and vector processing units was pivotal in realizing these goals, ensuring both performance efficiency and cost-effectiveness.

– The CM5 was engineered with ambitious performance targets, aiming for teraflop performance, terabytes of dynamic memory, and terabits per second of I/O throughput.

– The processing node incorporated a standard SPARC processor enhanced with specialized features for optimized parallel processing, ensuring both performance efficiency and cost-effectiveness.



Innovative Network Design and Fault Tolerance:

The CM5’s network structure was a marvel of engineering, featuring scalable bandwidth and adaptive routing algorithms. Extensive diagnostics, comprehensive error logging, and fault tolerance mechanisms exemplified a design philosophy that prioritized reliability and maintenance.

– The CM5’s network architecture was meticulously designed for teraflop performance, drawing insights from real-world problems to determine network requirements.

– The three-dimensional torus network topology, coupled with an adaptive routing algorithm, provided scalable bandwidth and efficient data transfer among processing nodes.

– This innovative network structure addressed the bandwidth demands of massively parallel computers, enabling effective communication and data exchange.



Programming Model and Future Directions:

The CM series, especially CM5, influenced programming models in parallel computing, moving towards a paradigm where data distribution and parallel processing were central. The design’s scalability and adaptability to technological advancements signified a forward-thinking approach, setting a precedent for future developments in the field.

– The innovative design of the CM series stimulated new software developments, pushing the boundaries of parallel programming.

– The machine’s unique capabilities facilitated the exploration of novel programming paradigms, enabling researchers to tackle complex computational challenges.

– Daniel Hillis anticipated further advancements in software for the Connection Machine, recognizing its potential to transform scientific computing.





The journey from CM1 to CM5 in parallel computing is a tale of innovation, challenge, and evolution. The CM series not only pushed the boundaries of computing performance but also influenced the direction of programming models and architectural design in computing. The legacy of the Connection Machines endures, highlighting a pivotal chapter in the history of computing technology.


Notes by: OracleOfEntropy