Fei-Fei Li (Stanford Professor) – “From Seeing to Doing (Oct 2022)


Chapters

00:00:05 Beckman Brown Lecture on Interdisciplinary Science
00:03:57 From Seeing to Doing: Understanding and Interacting with the Real World
00:07:12 Vision and the Evolution of Intelligence
00:13:48 Evolution of Object Recognition in AI
00:22:18 Visual Relationships and SYNGRAPH Representation for Scene Understanding
00:27:59 Ecological Approach to Robotic Learning in Household Environments
00:41:45 Modular vs End-to-End Approaches in Robotics Learning
00:55:24 Challenges and Considerations in Robotic Learning
01:05:41 National AI Research Cloud: Addressing the Digital Divide
01:08:08 Advancements in AI Language Models

Abstract

Bridging Visions and Actions: Exploring the Intersection of AI, Human Intelligence, and Robotics

Introduction

The Beckman Brown Lecture on Interdisciplinary Science, an event honoring Arnold O. Beckman and Dr. Ted Brown, recently delved into the interplay of artificial intelligence, human cognition, and robotics. Supported by the Arnold and Mabel Beckman Foundation, the lecture featured insights from Fei-Fei Li, a renowned computer scientist and human neuroscientist, and Ted Brown, an expert in linguistics. The discussion covered the evolution of vision from the Cambrian period to the latest developments in AI and robotics, emphasizing the critical role of vision in both natural and artificial systems.

Diane Beck, interim head of the Department of Psychology and a member of the Beckman Institute Executive Committee, opened the event. Ted Brown reminisced about the Beckmans and acknowledged the Foundation’s support, including an endowment for an annual lectureship and a postdoctoral fellowship. He introduced Aman Ahmed, the newly appointed postdoctoral fellow, who will collaborate with Martha Gillette and Brad Sutton. The Beckman Institute, since its inception in 1989, included a group dedicated to Cognitive Sciences, recognizing its emerging significance. The concept of embodiment, as introduced in the 1991 book “The Embodied Mind” by Varela, Thompson, and Roche, has since spurred extensive research in this field. Dr. Fei-Fei Li’s lecture exemplifies the reach of this concept and our connection with the world through AI. Diane Beck concluded the introduction and welcomed Dr. Li to speak.

The Evolution of Vision and Its Impact

The Cambrian explosion, a crucial event 540 million years ago, was likely triggered by the evolution of vision, as theorized by Andrew Parker. This event, which led to a rapid diversification of animal species, underscores the vital role of vision in survival and navigation, with most animals relying heavily on it. Fei-Fei Li’s research aims to extend our understanding of human vision, highly efficient in object recognition, to machines, developing intelligent systems with advanced visual capabilities.

AI’s Journey in Object Recognition

Initially, computer vision models were limited to hand-designed algorithms, but the field evolved significantly with the introduction of machine learning. The creation of large datasets, particularly ImageNet, a dataset with 15 million images across 22,000 object classes, marked a turning point. This development, especially the introduction of convolutional neural networks in 2012, heralded a new era in deep learning, continually improving image recognition capabilities. Fei-Fei Li stressed the importance of moving beyond mere object recognition to understanding the relationships between objects in visual scenes, as embodied in the Visual Genome dataset.

Vision, Perception, and Action in Intelligence

Fei-Fei Li argued that vision alone is not enough for a complete understanding of the world and advocated for an integrated approach, coupling vision with action. This perspective is vital in robotics, where learning is driven by interaction with the environment. The discussion concluded with an affirmation of vision’s fundamental role in intelligence, enabling understanding and action in the world, and its critical importance for developing intelligent machines and advancing AI.

Advancements in Robotic Learning

Li introduced various approaches in robotic learning, including curiosity-driven and task-driven learning. She discussed the concept of modularizing skills for long-horizon tasks and the importance of ecological validity. A notable contribution is the Behavior environment, a large-scale benchmark for robotic learning in virtual settings, designed to bridge the gap between simulation and real-world performance.

The Role of AI in Society and the Digital Divide

Fei-Fei Li addressed the growing digital divide and the necessity for accessible AI resources. The National AI Research Cloud, which she is part of, aims to democratize AI technology. Ted Brown’s comments on the complexity of language and the capabilities of GPT-3 highlighted the rapid advancements in language models and their implications for understanding human thought processes.

Fei-Fei Li, a Sequoia Professor at Stanford University and co-director of the Stanford Institute of Human-Centered AI, is known for her contributions to cognitively inspired AI, machine learning, deep learning, computer vision, and AI for healthcare. She co-founded and chairs AI for All, a nonprofit dedicated to training diverse K-12 students in AI. Her connection to the Beckman Institute began as an assistant professor, collaborating with Diane Beck.

Concluding Thoughts

The lectures at the Beckman Brown event underscored the complex relationship between human intelligence, AI, and robotics. Fei-Fei Li’s insights into integrating vision with action in AI, her concerns about the digital divide, and Ted Brown’s exploration of language complexity highlighted the multifaceted challenges and opportunities in this evolving field. As AI advances, understanding its interaction with human cognition and societal needs remains a critical area for exploration and development.

Fei-Fei Li emphasized the need to go beyond object recognition to understand visual scenes comprehensively, introducing the concept of scene graph representation. This approach represents objects as nodes and their relationships as edges, aiding in visual relationship prediction and zero-shot recognition. The SYNGRAPH representation has become popular for recognizing actions and activities in videos, and the Visual Genome dataset fosters algorithms that understand visual relationships and language descriptions of visual scenes. Computers can now generate story-like descriptions from visual scenes, providing a more complete understanding. Fei-Fei Li reflected on the limitations of understanding based solely on 2D projections on the retina, comparing it to Plato’s Allegory of the Cave.

The link between perception and action is crucial, as illustrated by studies on kittens and mirror neurons in primates. Robotics learning, a field facing challenges in unstructured environments, is focusing on curiosity-driven and task-driven learning, with visual understanding and representation being key. Fei-Fei Li introduced the Behavior Benchmark for Everyday Household Activities, a large-scale, ecologically valid benchmark for robotic learning. She also discussed the creation of a logical form language for robotics, BDDL, enabling robots to understand and execute tasks.

Addressing questions, Fei-Fei Li spoke about the influence of psychology and human neuroscience on her work, the importance of effective computing of emotions in learning, and her preference for a modular approach in robotics learning. She questioned the meaning of AGI, discussed social learning in robots, and acknowledged the computational costs in computer vision. Lastly, she highlighted efforts to address the digital divide in AI research and access to resources, including advocacy for a national AI research cloud and potential solutions for AI’s ability to process language and thought effectively.


Notes by: Ain