Fei-Fei Li (Stanford) (Oct 2022)

Fei-Fei Li (Stanford Professor) – “From Seeing to Doing (Oct 2022)

Chapters

00:00:05 Beckman Brown Lecture on Interdisciplinary Science

00:03:57 From Seeing to Doing: Understanding and Interacting with the Real World

00:07:12 Vision and the Evolution of Intelligence

00:13:48 Evolution of Object Recognition in AI

00:22:18 Visual Relationships and SYNGRAPH Representation for Scene Understanding

00:27:59 Ecological Approach to Robotic Learning in Household Environments

Vision and the Nervous System:
Vision is an active process, not passive. The nervous system links perception with action. Seeing and doing are greatly connected.

The Importance of Interplay:
Kitten study demonstrates the importance of movement and interaction for visual development. Active kittens develop normal perceptual capabilities, while passive kittens have damaged capabilities. Mirror neurons in primates further illustrate the link between doing and seeing.

Perception Drives Interaction:
Perception was driving interaction in evolution. The interplay between perception and interaction drove the evolution of intelligence. Seeing is for doing in the world.

Robotic Learning:
Robotics is a classic field, but challenges remain in unstructured environments. Robotic learning focuses on curiosity-driven and task-driven learning. Visual understanding and representation are crucial for robotic learning.

Curiosity-Based Learning:
Learning to play: building AI systems intrinsically motivated to interrogate and learn the world. World model network predicts consequences of actions. Self-model network predicts errors of the world model. The learning process empowers the self-model to produce adversarial actions and learn through interactions. Results show developmental trajectory of learning: ego motion, object attention, one object learning, multi-object learning.

Traditional Robotic Learning:
Short-term, short-horizon punctual activities (e.g., grasping, rotating, opening doors). Long-horizon task: cleaning up tabletop, stacking bowls and utensils. Modularized short-term skills into a representation called neurotask programming. Robots can sort multiple objects and deal with human destructive behavior.

Perceptually Driven Robotic Learning:
Research has focused on algorithms, especially deep RL-based algorithms. Challenges include small-scale experiments, lack of principled task selection, and absence of standard metrics. Real-world applications often fail due to lack of generalization.

Ecological Approach to Robotic Learning:
Inspiration from computer vision and NLP: large training environments and benchmarks drive research. Focus on real-world household environments: ecologically complex, dynamic, uncertain, variable, interactive, social, inherently multitask.

Behavior Benchmark for Everyday Household Activities:
Rolling out a benchmark called Behavior, Benchmark for Everyday Household Activities in Virtual Interactive Ecological Environments of a Thousand Human Tasks. Simulation of behavioral world with 1,000 tasks. Large scale and diverse, ecological in general, complex and standardized evaluation. Realistic and photorealistic, allows for kinematic and dynamic states of objects, flexible and deformable materials, realistic fluids, thermal effects, and interactivity.

Task Selection:
Importance of selecting relevant and impactful tasks. Consideration of societal impact, aging population, declining ratio of caregivers to seniors, and lack of good quality in aging. Asking humans what tasks they would prefer robots to do in a household setting. User study to assess the potential benefits of robots performing various tasks.

00:41:45 Modular vs End-to-End Approaches in Robotics Learning

Understanding the Need for Behavior Learning:
Fei-Fei Li emphasizes the importance of understanding tasks that humans want robots to help with, such as cleaning, preparing food, and playing games. Researchers created Behavior1K, a dataset of 1,000 activities ranked by their importance to humans.

Creating a Logical Form Language for Robotics:
To enable robots to understand and execute tasks, a logical form language called BDDL (Behavior Domain Description Language) was developed. BDDL allows researchers to define the initial and goal states for robots, enabling planning and control.

The Importance of Simulation Environments:
Fei-Fei Li highlights the crucial role of simulation environments for robotics research. Simulation provides fast training, transferability, safety, and allows for fair and reproducible benchmarks. Realistic simulation environments like OmniGibson, developed in collaboration with NVIDIA, capture important factors like thermal effects and deformability.

Challenges in Behavior Learning:
Current state-of-the-art reinforcement learning algorithms face difficulties in behavior learning. Experiments show that providing privileged information to algorithms improves performance, indicating the need for further research.

Closing the Gap between Simulation and Reality:
Fei-Fei Li presents an apartment built at Stanford and a real robot used to bridge the gap between simulation and real-world applications. While there have been successes in navigation and grasping tasks, there are still challenges and failures to overcome.

Vision as a Cornerstone of Intelligence:
Fei-Fei Li reiterates her belief that vision is fundamental to intelligence. Research in vision and visual robotic learning focuses on representation, learning algorithms, planning and control, data, and benchmarks.

Psychology and Human Neuroscience in Robotics Learning:
In response to a question, Fei-Fei Li discusses the influence of psychology and human neuroscience on her work. While humans have certain advantages, such as recognizing more car types than computers, AI algorithms excel in other areas like categorizing objects. She emphasizes the need to explore foundation models, neuro-symbolic approaches, and effective computing to improve the robustness, nimbleness, and flexibility of AI algorithms.

Effective Computing of Emotions in Learning:
Addressing another question, Fei-Fei Li acknowledges the importance of effective computing of emotions in learning but admits that research in this area is still in its early stages. She mentions theory of mind research and sentiment understanding through words as potential starting points.

End-to-End vs. Modular Approach in Robotics Learning:
In response to a question about the end-to-end versus modular approach in robotics learning, Fei-Fei Li expresses her preference for a modular approach. She believes that decomposing tasks into perception, planning, and control modules allows for better optimization and understanding of each component.

00:55:24 Challenges and Considerations in Robotic Learning

01:05:41 National AI Research Cloud: Addressing the Digital Divide

01:08:08 Advancements in AI Language Models

Abstract

Bridging Visions and Actions: Exploring the Intersection of AI, Human Intelligence, and Robotics

Introduction

The Beckman Brown Lecture on Interdisciplinary Science, an event honoring Arnold O. Beckman and Dr. Ted Brown, recently delved into the interplay of artificial intelligence, human cognition, and robotics. Supported by the Arnold and Mabel Beckman Foundation, the lecture featured insights from Fei-Fei Li, a renowned computer scientist and human neuroscientist, and Ted Brown, an expert in linguistics. The discussion covered the evolution of vision from the Cambrian period to the latest developments in AI and robotics, emphasizing the critical role of vision in both natural and artificial systems.

Diane Beck, interim head of the Department of Psychology and a member of the Beckman Institute Executive Committee, opened the event. Ted Brown reminisced about the Beckmans and acknowledged the Foundation’s support, including an endowment for an annual lectureship and a postdoctoral fellowship. He introduced Aman Ahmed, the newly appointed postdoctoral fellow, who will collaborate with Martha Gillette and Brad Sutton. The Beckman Institute, since its inception in 1989, included a group dedicated to Cognitive Sciences, recognizing its emerging significance. The concept of embodiment, as introduced in the 1991 book “The Embodied Mind” by Varela, Thompson, and Roche, has since spurred extensive research in this field. Dr. Fei-Fei Li’s lecture exemplifies the reach of this concept and our connection with the world through AI. Diane Beck concluded the introduction and welcomed Dr. Li to speak.

The Evolution of Vision and Its Impact

The Cambrian explosion, a crucial event 540 million years ago, was likely triggered by the evolution of vision, as theorized by Andrew Parker. This event, which led to a rapid diversification of animal species, underscores the vital role of vision in survival and navigation, with most animals relying heavily on it. Fei-Fei Li’s research aims to extend our understanding of human vision, highly efficient in object recognition, to machines, developing intelligent systems with advanced visual capabilities.

AI’s Journey in Object Recognition

Initially, computer vision models were limited to hand-designed algorithms, but the field evolved significantly with the introduction of machine learning. The creation of large datasets, particularly ImageNet, a dataset with 15 million images across 22,000 object classes, marked a turning point. This development, especially the introduction of convolutional neural networks in 2012, heralded a new era in deep learning, continually improving image recognition capabilities. Fei-Fei Li stressed the importance of moving beyond mere object recognition to understanding the relationships between objects in visual scenes, as embodied in the Visual Genome dataset.

Vision, Perception, and Action in Intelligence

Fei-Fei Li argued that vision alone is not enough for a complete understanding of the world and advocated for an integrated approach, coupling vision with action. This perspective is vital in robotics, where learning is driven by interaction with the environment. The discussion concluded with an affirmation of vision’s fundamental role in intelligence, enabling understanding and action in the world, and its critical importance for developing intelligent machines and advancing AI.

Advancements in Robotic Learning

Li introduced various approaches in robotic learning, including curiosity-driven and task-driven learning. She discussed the concept of modularizing skills for long-horizon tasks and the importance of ecological validity. A notable contribution is the Behavior environment, a large-scale benchmark for robotic learning in virtual settings, designed to bridge the gap between simulation and real-world performance.

The Role of AI in Society and the Digital Divide

Fei-Fei Li addressed the growing digital divide and the necessity for accessible AI resources. The National AI Research Cloud, which she is part of, aims to democratize AI technology. Ted Brown’s comments on the complexity of language and the capabilities of GPT-3 highlighted the rapid advancements in language models and their implications for understanding human thought processes.

Fei-Fei Li, a Sequoia Professor at Stanford University and co-director of the Stanford Institute of Human-Centered AI, is known for her contributions to cognitively inspired AI, machine learning, deep learning, computer vision, and AI for healthcare. She co-founded and chairs AI for All, a nonprofit dedicated to training diverse K-12 students in AI. Her connection to the Beckman Institute began as an assistant professor, collaborating with Diane Beck.

Concluding Thoughts

The lectures at the Beckman Brown event underscored the complex relationship between human intelligence, AI, and robotics. Fei-Fei Li’s insights into integrating vision with action in AI, her concerns about the digital divide, and Ted Brown’s exploration of language complexity highlighted the multifaceted challenges and opportunities in this evolving field. As AI advances, understanding its interaction with human cognition and societal needs remains a critical area for exploration and development.

Fei-Fei Li emphasized the need to go beyond object recognition to understand visual scenes comprehensively, introducing the concept of scene graph representation. This approach represents objects as nodes and their relationships as edges, aiding in visual relationship prediction and zero-shot recognition. The SYNGRAPH representation has become popular for recognizing actions and activities in videos, and the Visual Genome dataset fosters algorithms that understand visual relationships and language descriptions of visual scenes. Computers can now generate story-like descriptions from visual scenes, providing a more complete understanding. Fei-Fei Li reflected on the limitations of understanding based solely on 2D projections on the retina, comparing it to Plato’s Allegory of the Cave.

The link between perception and action is crucial, as illustrated by studies on kittens and mirror neurons in primates. Robotics learning, a field facing challenges in unstructured environments, is focusing on curiosity-driven and task-driven learning, with visual understanding and representation being key. Fei-Fei Li introduced the Behavior Benchmark for Everyday Household Activities, a large-scale, ecologically valid benchmark for robotic learning. She also discussed the creation of a logical form language for robotics, BDDL, enabling robots to understand and execute tasks.

Addressing questions, Fei-Fei Li spoke about the influence of psychology and human neuroscience on her work, the importance of effective computing of emotions in learning, and her preference for a modular approach in robotics learning. She questioned the meaning of AGI, discussed social learning in robots, and acknowledged the computational costs in computer vision. Lastly, she highlighted efforts to address the digital divide in AI research and access to resources, including advocacy for a national AI research cloud and potential solutions for AI’s ability to process language and thought effectively.

Notes by: Ain

Fei-Fei Li (Stanford Professor) – “From Seeing to Doing (Oct 2022)

Chapters

Abstract

Related posts: