Demis Hassabis (DeepMind Co-founder) – Towards General Artificial Intelligence | MIT (Jul 2016)
Chapters
00:00:17 AI Development at DeepMind and the AlphaGo Project
Introduction of Demis Hassabis: Demis Hassabis is a renowned individual with diverse expertise, including being a chess prodigy, studying computer science at Cambridge, founding successful computer game companies, obtaining a PhD in neuroscience from UCL, and holding a brief postdoc position under Tomaso Poggio.
DeepMind’s AI Development Approach: Demis will share an overview of DeepMind’s approach to AI development, delving into the philosophy behind their methodologies.
The Culmination of AlphaGo: The second half of Demis’s talk will focus on AlphaGo, a groundbreaking achievement in AI that showcases the company’s work.
Future Plans for AlphaGo: Demis will discuss their aspirations for the future of AlphaGo and its potential applications.
00:03:08 DeepMind: Fusing Silicon Valley and Academia for Artificial General Intelligence Research
DeepMind’s Background: DeepMind was founded in 2010 and later joined forces with Google in 2014. It has grown into a large team of over 200 research scientists and engineers.
Demis Hassabis’s Vision: He views DeepMind as an “Apollo program for AI,” aiming to fundamentally solve intelligence and subsequently use it to address various challenges. He believes AI is one of the most critical and powerful technologies mankind can create.
General Purpose Learning Algorithms: DeepMind’s focus is on developing general purpose learning algorithms. These algorithms can learn automatically from raw inputs and experiences without any pre-programming.
Artificial General Intelligence (AGI): DeepMind’s AGI is flexible, adaptive, and potentially inventive. AGI is designed to handle unexpected situations and adapt to unseen scenarios.
Narrow AI vs. AGI: Narrow AI is handcrafted for specific purposes and lacks flexibility. AGI, in contrast, can operate across a wide range of tasks and handle unexpected situations.
Deep Blue as an Example of Narrow AI: Deep Blue, a famous chess-playing computer, exhibited narrow AI. It was limited to playing chess and could not perform other tasks.
Reinforcement Learning: DeepMind’s approach to AI and intelligence is through reinforcement learning. Agents interact with an environment, receiving observations and rewards. The goal is to build an accurate model of the environment and select actions that lead towards a desired goal.
00:08:51 Deep Reinforcement Learning and Grounded Cognition: A Framework for Building General Intelligence
Deep Reinforcement Learning: DeepMind’s approach to AI combines deep learning with reinforcement learning, allowing RL to tackle challenging problems at scale. Reinforcement learning involves an agent interacting with an environment, receiving rewards for positive actions and penalties for negative ones, and learning to maximize rewards. DeepMind’s DQN algorithm, a deep reinforcement learning system, can play Atari games without prior knowledge of rules or game structure, achieving human-level performance on most games.
Grounded Cognition: DeepMind believes a true thinking machine must be grounded in a rich sensory-motor reality, whether through physical robots or virtual worlds. Virtual worlds can be used as testing grounds for AI algorithms, providing unlimited training data and independent benchmarks. Games, designed for human players, offer a challenging and diverse environment for AI development.
Systems Neuroscience: DeepMind incorporates systems neuroscience to understand the brain’s algorithms, representations, and architectures, focusing on the computational level rather than low-level synaptic details. Areas of focus include memory, attention, concepts, planning, navigation, and imagination, aiming to go beyond the success achieved with Atari games.
00:16:19 Neural Turing Machines for Symbolic Reasoning
Introduction of Neural Turing Machines (NTM): The concept of creating an artificial hippocampus, a brain structure involved in memory and navigation, as a potential solution for incorporating memory into neural networks. NTMs, a novel neural architecture, aims to address the need for large-scale, controllable memory capabilities in neural networks.
Key Features of NTMs: NTMs comprise a recurrent neural network, analogous to a CPU, and an extensive memory store, akin to a K-N memory system. This architecture allows for learning and manipulation of memory elements through gradient descent.
Symbolic Reasoning with NTMs: NTMs open up new avenues for symbolic reasoning, considered a holy grail in AI. Tackling classic AI problems, such as the Shrewsbury class of problems, which involve manipulating blocks and answering questions about the scene.
Mini Shroudelou: A Simplified Blocks World Problem: Due to the complexity of the full Shrewsbury class of problems, researchers introduced a simplified 2D version called mini Shroudelou. In mini Shroudelou, a neural network agent learns to manipulate colored blocks in a vertical tower to match a target configuration, demonstrating problem-solving capabilities.
Training and Testing of NTMs: NTMs are trained using reinforcement learning, allowing them to improve over time by trial and error. Once trained, NTMs are tested on unseen scenarios, evaluating their ability to solve problems optimally.
Solving Logic Puzzles and Graph Problems: NTMs have shown promising results in solving logic puzzles, demonstrating their capacity for symbolic reasoning. Ongoing research explores the application of NTMs to graph problems, a general class of problems with broad applications.
Upcoming Publications on NTMs: The team is preparing a significant publication later in the year, building upon their previous work on NTMs. This publication is expected to provide further insights and advancements in the field.
00:21:07 Unlocking the Mysteries of Go: From Rules to the Deep Blue Challenge
Neural Turing Machine and Sri Lalu Tasks: DeepMind is experimenting with incorporating a simplified version of language into Sri Lalu tasks, where a neural Turing machine reads and remembers constraints given in code to solve puzzles.
Moving to 3D Environments: DeepMind has repurposed the Quake 3 engine, calling it Labyrinth, to tackle navigation and 3D vision problems in a virtual environment. An agent learns to navigate, collect rewards, and find exits solely through pixel inputs.
Towards a Rat-Level AI: DeepMind aims to create an AI agent capable of performing various tasks that rats can do, drawing inspiration from experimental ideas and tests from rat literature.
AlphaGo and the Challenge of Go: AlphaGo combines neural networks with planning approaches to master the complex game of Go. Go’s profound complexity stems from its simple rules, vast possible board configurations, and the need for both intuition and calculation.
Go’s Challenges for Computers: The huge branching factor (200 moves per position) and the lack of an evaluation function make brute force search and handcrafting rules for evaluation difficult. Go is a constructive game with no materiality concept, requiring prediction and susceptibility to local changes. Intuition and calculation are crucial for Go mastery, pushing the limits of human capabilities.
00:31:15 Training AlphaGo: From Supervised Learning to Reinforcement Learning and Monte Carlo Tree Search
Divine Moves and Intuitive Play in Go: Top Go players often describe brilliant moves as feeling “right” rather than being calculated. Go has a history and tradition of intuition and a concept called “divine moves.” Famous games and moves are passed down, inspiring players to achieve their own divine moves.
Training AlphaGo with Deep Neural Networks: AlphaGo used two deep neural networks: a policy network and a value network. The policy network was trained on human expert data to predict human moves. Reinforcement learning was used to improve the policy network’s win rate through self-play. The value network was trained on game data to evaluate board positions and predict the winner.
Policy Network and Value Network: The policy network takes a board position as input and outputs a probability distribution over possible moves. The value network takes a board position as input and outputs a single real number indicating the probability of white or black winning.
Monte Carlo Tree Search with Neural Networks: Monte Carlo Tree Search is used to plan moves by expanding the search tree and evaluating positions. The policy network is used to narrow the search width by suggesting probable moves. The value network is used to evaluate the desirability of positions, reducing the search depth. AlphaGo combines value network evaluation with Monte Carlo rollouts for accurate position assessment.
Neural Networks Reduce Search Complexity: The policy network reduces the width of the search tree by suggesting probable moves. The value network reduces the depth of the search tree by providing an evaluation of board positions. This combination of neural networks makes the search more efficient and accurate.
00:39:53 AlphaGo's Performance Against Computer Programs and Human Players
Performance against Commercial Go Programs: AlphaGo was tested against CrazyStone and Zen, the two leading commercially available Go programs. AlphaGo achieved an impressive 494-1 win record against these programs. Even when given a four-stone handicap, AlphaGo maintained a 75% win rate. AlphaGo’s single-machine version outperformed the distributed versions of the competitor programs.
ELO Rating System: A numerical ELO ranking system, similar to the one used in chess, was developed to assess the strength of Go programs. A gap of 200-250 ELO points corresponds to an 80% win rate.
AlphaGo’s Superiority: AlphaGo surpassed the other best programs by more than 1,000 ELO points.
Challenge to Human Players: After defeating the top computer programs, AlphaGo was ready to face human opponents. Fan Hui, a Chinese-born Go player based in France, was selected as the first human challenger.
00:42:08 AlphaGo: Defeating Professional Go Players
AlphaGo’s 5-0 Victory Against a Professional Go Player: AlphaGo, a computer program developed by Demis Hassabis and his team, achieved a groundbreaking victory against a professional Go player, becoming the first program to reach professional status in the game. The defeated player, a two-dan professional and three-time European champion, initially preferred a slow and strategic approach but switched to a more aggressive style in subsequent games, resulting in a 5-0 loss to AlphaGo.
Expert’s Perspective on AlphaGo’s Victory: AI experts and top programmers had predicted that this milestone would take at least another decade to achieve, making AlphaGo’s victory a significant surprise. The Go world, including professional players, believed that it would take much longer for a computer program to reach professional levels in the game.
AlphaGo’s Impact on the Professional Go Player: After the match, the defeated player joined AlphaGo’s team as a consultant, contributing to the program’s development. The player’s experience playing against AlphaGo led to a significant improvement in his own skills, moving him up in the world rankings. The player attributed his improvement to AlphaGo’s unique style, which freed his mind from traditional constraints and allowed him to think creatively about the game.
AlphaGo’s Match Against Lee Sedol: AlphaGo’s next challenge was to face Lee Sedol, a legendary Go player known as the “Roger Federer of Go.” Lee Sedol had won 18 world titles and was renowned for his creative style and brilliance. The match between AlphaGo and Lee Sedol took place in early March in Korea, with a million-dollar first prize at stake.
00:45:21 AlphaGo's Creative Play and Intuition in Go
Key Findings from AlphaGo’s 4-1 Match Win against Lee Sedol: AlphaGo’s Compute Power: Despite using the same compute power as in the Fan Hui match, AlphaGo’s strength only showed a modest increase, suggesting diminishing returns and the limited effectiveness of parallelization. AlphaGo’s Drastic Improvement: AlphaGo’s new version was significantly stronger than the old version, winning 99.9% of matches. This vast improvement occurred within five months. AlphaGo’s Honorary 9 Dan Certificate: The Korean Go Association recognized AlphaGo’s creative play by awarding it an AlphaGo with an honorary 9 Dan certificate.
AlphaGo’s Creativity and Intuition: AlphaGo’s Move 37 in Game Two: AlphaGo made an astounding move, defying conventional wisdom in Go, by playing on the fifth line instead of the traditional third or fourth lines. The Significance of Move 37: This move highlighted AlphaGo’s ability to think strategically and plan long-term, as its impact became evident 50 moves later, influencing a fight in the bottom left corner. Reactions to Move 37: Commentators and professionals were astounded by AlphaGo’s move, with some believing it was a misclick due to its unconventional nature. Statistics Behind Move 37: The policy network gave the prior probability of this move as less than 1 in 10,000, indicating that AlphaGo did not learn it from professional games.
AlphaGo’s Originality and Potential: AlphaGo’s Original Move: Many professionals commented that Move 37 was not a human move, emphasizing AlphaGo’s ability to generate novel strategies. AlphaGo’s Potential: AlphaGo’s capacity for original thought and strategic planning suggests its potential to contribute to scientific and creative fields beyond Go.
00:52:11 AlphaGo's Surprising Moves and Cultural Impact
AlphaGo’s Surprising Moves: AlphaGo displayed originality by making moves that were unexpected even to its developers. It favors influence in the center of the board and values it so much that it considers fifth line influence to be a good trade. This move challenges conventional wisdom in the game of Go and may lead to a reevaluation of acceptable trades.
Lee Sedol’s Win in Game 4: Lee Sedol won Game 4 by playing an incredible move with a probability of less than 1 in 10,000. AlphaGo had to start its pondering and search again, leading to a mis-evaluation in the value net. This move highlights Lee Sedol’s creativity and fighting spirit, making him one of the top game players Demis Hassabis has met.
Cultural Impact of the Match: The match attracted 280 million viewers worldwide, surpassing the Super Bowl’s viewership. There was widespread media coverage, with 35,000 press articles daily and front-page features in Korean newspapers. The match popularized Go in the West, leading to a shortage of Go boards and increased interest in the game.
AlphaGo’s Progress: AlphaGo’s progress has been remarkable, with a one-rank improvement per month over the last 18 months. The project uses techniques that create more data, train new versions, and deliver higher-quality data. The rate of progress has been astonishing, and it remains to be seen how far it can go before reaching optimal play.
00:55:57 AlphaGo's Impact on Go and AI's Potential for Real-World
AlphaGo’s Impact on the Game of Go: AlphaGo has revitalized the game of Go and brought new ideas and creativity to the game, potentially improving the overall standard of Go. Professional Go players are excited about AlphaGo’s release to the public, believing it will further enhance their understanding and enjoyment of the game.
Comparison of AlphaGo with Deep Blue: Unlike Deep Blue, which relied on handcrafted chess knowledge, AlphaGo learns solely from expert games and self-play, demonstrating its ability to acquire knowledge without explicit programming. AlphaGo’s search algorithm is highly selective, guided by two neural networks, allowing it to evaluate 100,000 positions per second compared to Deep Blue’s 200 million positions per second.
Defining Intuition and Creativity in the Context of AlphaGo: Intuition in Go is considered implicit knowledge gained through experience, not consciously accessible or expressible but verifiable through behavioral output. Creativity in Go involves synthesizing accumulated knowledge to produce novel or original ideas. AlphaGo demonstrates both intuition and creativity within the constrained domain of Go.
DeepMind’s Goals Beyond AlphaGo: DeepMind aims to apply the general-purpose technologies developed for AlphaGo to solve challenging real-world problems. Areas of interest include healthcare, robotics, and personal assistance.
Future Developments and Challenges: Exploring the possibility of a group of top professional Go players competing against AlphaGo, leveraging their collective strengths in different aspects of the game. Developing new analysis, statistical, and visualization tools to better understand the inner workings of deep learning systems like AlphaGo. DeepMind’s Virtual Brain Analytics project seeks to create tools to analyze and visualize the representations formed by neural networks.
Openings for Collaboration: DeepMind is actively hiring research scientists and software engineers to join their team and contribute to their ongoing work.
01:02:58 Supervised Learning vs. Reinforcement Learning in AI Development
New Experiments in Reinforcement Learning: Researchers at DeepMind plan to conduct an experiment to train an AI system from scratch using reinforcement learning, without the use of supervised learning or human expert play. This experiment aims to determine if it is possible for an AI system to achieve expert-level performance through reinforcement learning alone.
Challenges of Generalizing AI from Games: There is a risk that AI systems trained on games may not generalize well to real-world applications. It is crucial to ensure that the AI system only has access to information that it would have access to in the real world. DeepMind’s evaluation team is separate from the algorithm development teams and responsible for creating the environments and APIs for the AI systems. This ensures that the AI systems cannot access information they are not supposed to.
Approaches to AI Development: DeepMind is pursuing both self-training and research in new architectures and parameters to develop AI systems. The company is utilizing a combination of approaches to achieve the best results.
Abstract
The Evolution of AI: From Chess to Go, and Beyond
Breaking Ground in AI: Demis Hassabis and the DeepMind Odyssey
Demis Hassabis, a figure synonymous with contemporary AI advancements, recently delivered an enlightening talk at MIT. Hassabis, whose journey from chess prodigy to AI luminary encompasses extensive studies in computer science, successful gaming ventures, and neuroscience research, is the mastermind behind DeepMind, an organization that has redefined the landscape of artificial intelligence.
DeepMind’s Inception and Ambitious Mission
DeepMind, founded in 2010 and later joining forces with Google in 2014, embarked on a mission to unravel the enigma of intelligence. Its goal is to replicate the versatility and adaptability of human intelligence in machines, thereby enabling them to autonomously tackle a diverse array of challenges. The organization has grown into a large team of over 200 research scientists and engineers. Demis Hassabis views DeepMind as an “Apollo program for AI,” aiming to fundamentally solve intelligence and subsequently use it to address various challenges.
Philosophical Underpinnings and Methodological Approaches
DeepMind’s philosophy is centered on creating general-purpose learning algorithms. These algorithms are designed to automatically learn from raw inputs and experiences without pre-programming, exhibiting flexibility and adaptability necessary for unforeseen situations. This is in stark contrast to narrow AI systems limited to specific tasks. DeepMind’s AGI is designed to be flexible, adaptive, and potentially inventive, capable of operating across a wide range of tasks and handling unexpected scenarios, unlike the narrow AI exhibited by the chess-playing computer Deep Blue.
Advancing the Frontiers of AI Through Reinforcement Learning
Reinforcement learning (RL), where an AI agent learns from interactions within an environment guided by reward signals, is at the heart of DeepMind’s strategy. This approach, inspired by the brain’s dopamine-driven learning mechanisms, is seen as pivotal in achieving general intelligence.
From Virtual Playgrounds to Real-World Challenges
DeepMind posits that true AI must be grounded in rich sensory-motor experiences. Virtual worlds and games, with limitless data and controlled testing environments, are ideal platforms for developing and testing AI algorithms. These environments provide unlimited training data and independent benchmarks, offering a challenging and diverse environment for AI development.
Deep Reinforcement Learning: The Fusion of Learning and Perception
DeepMind’s approach to AI, combining deep learning with reinforcement learning, is exemplified in Deep Reinforcement Learning (DRL). DRL, initially tested on the Atari 2600 platform, demonstrated AI’s ability to master various games, signaling a significant leap in AI capabilities. The approach involves an agent interacting with an environment, receiving rewards for positive actions and penalties for negative ones, and learning to maximize rewards. DeepMind’s DQN algorithm, a deep reinforcement learning system, plays Atari games without prior knowledge of rules or structure, achieving human-level performance.
Artificial Hippocampus and Neural Turing Machines (NTMs):
DeepMind’s exploration into systems neuroscience has led to the development of the Neural Turing Machine (NTM), a model combining neural networks with a memory storage system, allowing for symbolic reasoning and complex problem-solving. NTMs address the need for large-scale, controllable memory in neural networks. They comprise a recurrent neural network, similar to a CPU, and an extensive memory store, like a K-N memory system, allowing for learning and manipulation of memory through gradient descent. NTMs open up new avenues for symbolic reasoning and can tackle classic AI problems, such as the Shrewsbury class of problems, involving block manipulation and scene understanding. Mini Shroudelou, a simplified 2D blocks world problem, demonstrates NTMs’ problem-solving capabilities. Trained using reinforcement learning, NTMs improve over time and are tested on unseen scenarios. They show promise in solving logic puzzles and graph problems, with a significant publication on NTMs expected later in the year.
The AlphaGo Phenomenon: A Milestone in AI
AlphaGo, combining neural networks with advanced planning techniques, marked a watershed moment in AI history. Mastering Go, a game known for its vast search space and reliance on intuition, AlphaGo demonstrated unprecedented AI sophistication. AlphaGo overcame Go’s challenges, which include a huge branching factor and the need for intuition and calculation. It used two deep neural networks, a policy network and a value network, trained through supervised learning and reinforcement learning, respectively. The policy network predicts moves, while the value network evaluates board positions. AlphaGo’s use of Monte Carlo Tree Search, assisted by neural networks, allowed for more efficient search processes, demonstrating its originality and potential.
Training AlphaGo: A Blend of Human Mimicry and Self-Learning
AlphaGo’s training involved a blend of human mimicry and self-learning. It utilized a policy network trained through supervised learning to mimic human moves, and a value network to predict game outcomes. Additionally, it underwent millions of self-play games, constantly improving its strategies and generating a new dataset of expert-level games.
Monte Carlo Tree Search: Enhancing AI Strategy
The incorporation of Monte Carlo Tree Search (MCTS) in AlphaGo’s design revolutionized its strategy. MCTS, augmented by neural networks, allowed for a more efficient search process, evaluating the desirability of moves based on a blend of action value and prior probability.
AlphaGo’s Astonishing Achievements and Future Implications
AlphaGo’s stunning victories, including a 5-0 win against Fan Hui and a 4-1 triumph over Lee Sedol, not only shocked the Go community but also ignited global discourse on AI’s potential and limitations. AlphaGo’s use of original strategies, especially the famous “Move 37” against Lee Sedol, showcased an AI’s ability to generate novel strategies, challenging traditional Go strategies. DeepMind’s vision extends beyond mastering games. Its technologies, developed through projects like AlphaGo, are being directed towards solving real-world problems in healthcare, robotics, and personal assistance. The success of AlphaGo has also spurred interest in demystifying deep learning models, with initiatives like Virtual Brain Analytics at the forefront. Plans are underway to train AI using only reinforcement learning, starting from a blank slate. This approach could unlock new fields of expertise and understanding, while ensuring AI generalizes beyond games to real-world applications.
Pioneering a New Era of Artificial Intelligence
DeepMind, under Hassabis’s leadership, is not just transforming our understanding of games; it is reshaping our perspective on artificial intelligence. From the depths of ancient Go strategies to the potential of AI in addressing complex real-world challenges, DeepMind’s journey is a testament to the boundless possibilities within the field of artificial intelligence.
DeepMind's approach to artificial intelligence involves developing general-purpose learning algorithms using reinforcement learning, aiming for systems that can solve various tasks without explicit programming. AlphaGo's success in mastering the complex game of Go demonstrated the potential of this approach and highlighted the challenges of intuition and creativity in AI....
DeepMind is revolutionizing AI through general-purpose learning systems that can adapt to various domains, from drug discovery to material design, showcasing the potential for AI-human collaboration in solving complex challenges. DeepMind's AlphaGo, the first computer program to defeat a professional human Go player, demonstrated AI's potential beyond gaming and sparked...
AlphaGo's victory over Go champion Lee Sedol marked a pivotal advancement in AI and sparked a renaissance in intelligence research, with implications for AGI development and potential applications in diverse fields. AlphaGo's strategies revolutionized Go, challenged conventional wisdom, and revealed new possibilities for human-AI collaboration and AI-assisted scientific discoveries....
Neural networks have evolved from simple models to complex architectures like RNNs and NTMs, addressing challenges like training difficulties and limited computational capabilities. Advanced RNN architectures like NTMs and ConvGRUs exhibit computational universality and perform complex tasks like long multiplication....
DeepMind's AlphaFold revolutionized protein structure prediction, while its AI systems achieved breakthroughs in games like Go and chess, leading to ethical considerations and advancements in AI development. DeepMind's approach to AI emphasizes the scientific method, responsible deployment, and treating AI with respect and caution....
DeepMind, under Demis Hassabis' leadership, has made groundbreaking strides in artificial intelligence, notably with AlphaGo's achievement in the complex game of Go, and aims to understand and recreate intelligence artificially. AlphaGo's success in Go showcased its ability to mimic human intuition, creativity, and strategic thinking, revolutionizing the game and inspiring...
The Royal Society explores the future of AI technologies, while DeepMind focuses on developing AI systems that can adapt and learn like humans. AI has the potential to revolutionize various fields but also poses challenges related to ethics, safety, and societal impact....