Geoffrey Hinton (Google Scientific Advisor) – How to represent part-whole hierarchies in a neural net (Jul 2021)


Chapters

00:00:01 Unifying Representation Glom by Geoff Hinton
00:02:04 GLOM: A Hierarchical System of Embeddings for Visual Perception
00:10:50 Perceiving Structure in Complex Objects
00:14:45 Structural Descriptions and Viewpoint Assignment in Mental Images
00:18:53 Contrastive Self-Supervised Learning for Image Representation Extraction
00:22:18 Spatial Coherence in Neural Networks
00:25:20 Hierarchical Neural Fields: A New Framework for Object Recognition
00:33:47 Transformer-Based Geometric Understanding
00:36:05 Composing High-Level Concepts from Sparse Feature Detectors
00:44:41 Neuromorphic Networks: Using Hardware for Thinking About Consciousness
00:47:05 Self-Supervised Learning for 3D Scenes: From Object-Centric
00:49:38 Falsifying Theories of Brain Function Through Engineering
00:55:57 Unsupervised Learning Through Contrastive Representation
00:59:00 Neural Networks: Explaining Neuropsychological Phenomena

Abstract



Geoffrey Hinton’s Visionary Leap in AI: Unraveling the Intricacies of the Human Mind and Visual Perception

Geoffrey Hinton, an acclaimed AI pioneer from the University of Toronto and Google, has significantly advanced the field of artificial intelligence. His groundbreaking work includes the development of the GLOM neural network, a transformative approach aiming to closely mimic human visual processing. This system, which combines elements of transformers, unsupervised learning, and generative models, seeks to understand and replicate the hierarchical and psychological realities of human vision. Hinton’s contribution extends beyond GLOM, encompassing breakthroughs in deep neural networks, earning him the Turing Award in 2018, and his innovative work in areas such as contrastive self-supervised learning and the efficient representation of complex visual structures. His insights into mental imagery, structural parsing, and the handling of ambiguous visual information mark a significant step towards bridging the gap between human cognitive abilities and artificial intelligence.

Introduction of Geoff Hinton:

Geoff Hinton is a luminary in the field of artificial intelligence, known for his pioneering work on Boltzmann machines, variational learning, capture networks, and data visualization. He popularized the backpropagation algorithm, a fundamental technique in deep learning, and played a key role in introducing word embeddings in natural language processing. His transformative contributions to the field earned him the Turing Award in 2018, which he shared with Yann LeCun and Yoshua Bengio.

Segment Summaries and Main Ideas

Geoff Hinton’s Contributions and Current Work:

Geoffrey Hinton’s pioneering work in AI, especially his development of the GLOM framework, has significantly influenced the field’s understanding of visual perception. GLOM’s unique approach to image processing emphasizes part-whole hierarchies and coordinate frames, providing a fresh perspective on how humans perceive and interpret visual information. Hinton’s structural description of a crown, utilizing a 1979 diagram, includes objects with intrinsic coordinate frames and transforms between objects and parts, reminiscent of old-fashioned computer graphics representations of geometry.

Conceptual Framework of GLOM:

GLOM’s architecture, inspired by human vision, encompasses a hierarchical structure of embedding vectors and neural fields to represent and process images. This system mirrors the human brain’s method of visual perception, focusing on parts and wholes, and adeptly manages the complexity of visual information. Each node in a structural description can have an associated coordinate transform to the viewer, facilitating the propagation of consistent viewpoint information throughout the structural description. This allows for efficient computation of relationships between distant parts in the structure.

Understanding Human Perception with GLOM:

Hinton’s insights into the psychological reality of human vision, demonstrated through experiments like cube perception and sentence ambiguity, highlight the complexity and multidimensionality of human perception. GLOM’s design reflects these intricacies, offering a pathway to more human-like AI. Hinton argues that mental images are not pixel-based but involve viewpoint assignment, allowing for the computation of relationships between different parts of the mental image. Mental images typically orient with north being vertical. He also provides an example of a mental imagery task involving navigation and orientation, further illustrating the use of viewpoint information in mental images.

The Role of Contrastive Self-Supervised Learning:

Contrastive self-supervised learning, a method Hinton introduced and refined, is crucial in GLOM’s ability to process and interpret visual information. This technique focuses on extracting similar representations from different image patches, initially proposed in 1992 but gaining recognition in 2016. SimClear, a contrastive self-supervised learning algorithm developed in Toronto, processes two crops of an image to obtain representations, which are then dimensionally reduced and subjected to contrastive loss. This technique is effective in extracting scene representations and, when applied to tasks like ImageNet, object representations. After obtaining representations, a linear classifier with softmax can be used for classification, achieving performance comparable to supervised learning methods while only training the final layer of weights.

The Architectural Complexity and Innovations in GLOM:

GLOM’s intricate architecture, involving multiple levels of embeddings, complex coordinate transformations, and echo chambers, showcases an advanced approach to handling ambiguity and representing visual structures in a way that mimics the human brain’s processing capabilities. Glom, inspired partly by biology, aims to achieve spatial coherence of familiar shapes through attention and to represent objects with identical vectors at the object level. This neural network architecture replicates weights, similar to knowledge replication in cells, and uses hardware equivalents like a column of hardware representing a small patch of an image.

GLOM’s Implementation and Future Potential:

The implementation of GLOM, utilizing techniques like unsupervised learning and knowledge transfer, demonstrates its versatility in AI, capable of addressing complex visual tasks and advancing our understanding of both artificial and human intelligence. Vision is conceptualized as a sampling process, where the focus is on detailed processing of specific areas.

Hinton’s Perspective on AI and Neuroscience:

Hinton’s work, while primarily focused on artificial intelligence, also provides significant insights into neuroscience and cognitive science. He advocates the use of neural networks in scientific research and validates brain models through computational approaches, creating a unique intersection between AI and human cognition.



Geoffrey Hinton’s development of the GLOM neural network exemplifies the potential of AI in mimicking and understanding human cognitive processes. By bridging AI, visual perception, and cognitive science, his contributions deepen our understanding of both artificial and human intelligence, opening new paths for exploration and innovation. His work not only enhances our comprehension of the complex nature of the human mind but also paves the way for more advanced and human-like artificial intelligence systems.

Update:

Hinton’s recent work further explores GLOM’s architecture and functionality, revealing a three-level embedding system within each column of the hierarchical neural fields model. He emphasizes bottom-up, top-down, and local interactions in shaping these embeddings and introduces transformational random fields for disambiguating parts. This enables the model to identify objects relative to each other and proposes an iterative network using log probabilities to combine distributions and select the most likely interpretation of ambiguous parts.

Supplemental Update:

Hinton’s unsupervised learning system learns a hierarchical representation of images without relying on labeled data, using partial autoencoders and contrastive learning. He suggests studying brain injuries to validate brain models and explores neural networks with attractors to model semantic meaning in language. These developments continue to bridge the gap between AI and neuroscience, providing valuable insights into the human mind.


Notes by: ZeusZettabyte