Fei-Fei Li (Stanford Professor) – Octopus, Kittens & Babies (Jul 2020)


Chapters

00:00:00 Computer Vision: From Neuroscience Inspiration to the Search for North Star Problems
00:10:34 Cognitive Neuroscience's Influence on Computer Vision
00:14:24 Plato's Cave: From ImageNet to Storytelling and Beyond
00:19:35 Seeing Is for Doing: Active Perception and Interaction in Intelligence
00:28:13 Key Point Representation for Robot Manipulation
00:31:47 Learning Structured Task Representation for Long Horizon Interactive Tasks

Abstract

Vision and Intelligence: The Evolutionary Journey of Computer Vision in AI



In this comprehensive exploration, we delve into the remarkable journey of computer vision, tracing its evolution from the early days of neural network algorithms to its pivotal role in robotics and AI today. Fei-Fei Li’s insights shed light on the field’s quest for North Star problems and the watershed moment marked by the Deep Learning Revolution and ImageNet. The article further explores cognitive neuroscience’s influence on computer vision, the transformative power of ImageNet, and the emergent fields of storytelling and scene graphs in image understanding. In the field of robotics and AI, we examine the philosophy of active perception, the cruciality of tool usage in robots, and innovative approaches in long horizon interactive tasks, highlighting the inseparable link between perception and action in intelligent systems.



1. The Visionary Journey of Computer Vision:

Fei-Fei Li, an expert in AI vision, unveils the fascinating history of computer vision. Beginning with Larry Roberts’ work in the 1960s and the influence of neuroscience on neural network algorithms, she charts the field’s search for defining problems and the significant gap between the inception of these algorithms in the 1980s and their breakthrough in 2012. Li attributes this gap not just to data and computational advancements but also to the pursuit of foundational challenges guiding research efforts.

A Brief History of Computer Vision:

Fei-Fei Li highlights the significance of vision as a fundamental aspect of intelligence, crucial in the evolutionary journey of animals and humans. The inception of computer vision in the 1960s was heavily inspired by neuroscience and the pursuit of ‘North Star’ problems. Early research in this field was focused on understanding edges, 3D vision, and object recognition.

The Role of Neuroscience in Neural Network Algorithms:

The pioneering experiments by Hubel and Wiesel on the cat visual cortex offered deep insights into the hierarchical organization of visual processing. These insights were foundational in shaping neural network algorithms, which gained significant advancement with the development of backpropagation in the 1980s.

2. The Deep Learning Revolution and ImageNet:

The convergence of computing power, algorithms, and data, particularly the emergence of deep learning architectures like convolutional neural networks, marked a pivotal turn in the field of computer vision. The 2012 ImageNet Challenge and Geoff Hinton’s contributions were instrumental in catalyzing the deep learning revolution, leading to the widespread adoption of neural networks in image classification and detection tasks.

ImageNet and the Rise of Neural Networks:

The development of ImageNet and the associated ImageNet Challenge acted as a guiding force for computer vision, establishing a benchmark and roadmap for progress. Neural networks, through their dominance in the ImageNet Challenge, led to significant advancements in tasks such as image classification and detection. Architectures like AlexNet, GoogleNet, VGGNet, and ResNet emerged as successful models in these domains.

3. Cognitive Neuroscience’s Impact and the Rise of Neural Networks:

Cognitive neuroscience has had a profound impact on computer vision by revealing how the brain processes visual information. Research in the 1970s and 1990s on object detection and categorization in the human brain inspired a greater focus on object recognition in computer vision. This led to the creation of medium-sized datasets like Pascal VOC and the ImageNet dataset, which captured the complexity of visual concepts.

Cognitive Neuroscience’s Seminal Contributions:

Research in the 1970s unveiled the human vision’s innate ability to detect and recognize objects, even with minimal temporal information. Simon Thorpe’s 1996 study further revealed that the human brain could perform complex object categorization tasks incredibly quickly, within 150 milliseconds, underscoring the innate nature of this capability.

Nancy Kevensher’s Findings:

Nancy Kevensher’s research identified specific brain areas responsible for object categorization, particularly for categories like faces, places, and body parts.

Impact on Computer Vision:

The insights from cognitive neuroscience research served as a catalyst for computer vision’s exploration of object recognition at the category level, guiding the field’s focus in the early 21st century. The emergence of datasets like Pascal VLC facilitated this focus on object recognition.

Fei-Fei Li’s Personal Journey:

Fei-Fei Li’s involvement in computer vision was influenced by her work with Pietro Perona and her exposure to George Miller’s research on organizing concepts through WordNet. Inspired by the vast scale of concepts and categories in the human mind, Li and her colleagues developed the ImageNet dataset.

4. From Image Classification to Relationship Understanding:

Fei-Fei Li’s interest in storytelling through images led her to actively research image captioning and dense captioning. This resulted in significant progress in generating captions for images, evolving from simple labels to detailed descriptions. Li and her students delved into understanding relationships, which led to the development of scene graphs, a representation connecting entities and their relationships in visual scenes. This innovation revolutionized image retrieval and relationship understanding. Li criticizes the static vision approach, likened to Plato’s allegory of the cave, arguing that this perspective limits our understanding of the visual world.

5. Active Perception and Interaction in Intelligent Systems:

The philosophy of “seeing is for doing” underscores the active nature of vision and intelligence. The Cambrian explosion, 530 million years ago, marked a significant evolution in species driven by sensory perception, especially vision, which enabled animals to move and explore, leading to the evolution of sophisticated intelligence. Early human development involves active exploration and manipulation of the environment, where perception, action, and navigation collaborate to stimulate the brain. This concept has significantly influenced AI, highlighting the importance of active perception. The Held and Hind kittens experiment in the 1960s demonstrated the vital link between perception and action, where one kitten allowed to move freely developed better motor and perception systems compared to the one confined.

6. Tool Usage and Long Horizon Tasks in Robotics:

In robotics, the challenge lies in transforming high-dimensional visual data into actions, particularly for tool usage. Li discusses an encoder-decoder network that maps visual information to actions using key points for grasping and using tools. This approach is extended to long horizon interactive tasks, where structured task representation and neural task graph inference offer innovative solutions for complex, multi-step tasks. Key point representation, a robust idea in computer vision, is used to infer grasp points, function points, and effect points on each tool, aiding in assembling new tools for specific tasks. In terms of long horizon interactive tasks, structured task representation and one-shot visual imitation are vital. Neural task programming and neural task graph inference, albeit with less supervision, are used to improve generalization and task execution in robotics.

7. The Loop between Perception and Action:

In conclusion, the intertwined journey of vision and robotics emphasizes the fundamental link between perception and action in intelligent systems. The article encapsulates the evolution of computer vision, its impact on AI, and the ongoing research that continues to push the boundaries of what is possible in interactive real-world tasks, drawing inspiration from diverse fields like cognitive neuroscience and evolutionary biology.



This comprehensive overview of the field of computer vision and its integration with robotics and AI not only encapsulates the historical development and current state of the art but also sets the stage for future innovations and discoveries in this dynamic and ever-evolving field.


Notes by: Flaneur