Geoffrey Hinton (Google) (Jul 2021)

Geoffrey Hinton (Google Scientific Advisor) – How to represent part-whole hierarchies in a neural net (Jul 2021)

Chapters

00:00:01 Unifying Representation Glom by Geoff Hinton

00:02:04 GLOM: A Hierarchical System of Embeddings for Visual Perception

00:10:50 Perceiving Structure in Complex Objects

00:14:45 Structural Descriptions and Viewpoint Assignment in Mental Images

00:18:53 Contrastive Self-Supervised Learning for Image Representation Extraction

00:22:18 Spatial Coherence in Neural Networks

00:25:20 Hierarchical Neural Fields: A New Framework for Object Recognition

Architecture of the Hierarchical Neural Fields Model:
This model comprises numerous columns, each containing a three-level embedding system. The embedding at each level is influenced by bottom-up, top-down, and local interactions. Bottom-up interactions involve predicting higher-level embeddings from lower-level ones. Top-down interactions involve predicting lower-level embeddings from higher-level ones. Local interactions involve weighted averaging of nearby column embeddings, promoting agreement among similar embeddings.

Key Characteristics of the Model:
The model utilizes a simplified version of transformer architecture, employing exponentiated scalar products for attention. It relies on the mathematical principle that functions of multiple arguments can be implemented using functions of one argument, followed by addition, and a function of the resulting scalar. This enables the use of simple averaging operations as the sole interaction between localities. As the network settles, it forms distinct regions or “islands” of similar embeddings, which facilitate object segmentation.

Handling Object Vector Ambiguities:
The model addresses the challenge of using the same object vector for different locations within an object. It employs hierarchical neural fields, where the object vector includes information about the object’s pose relative to image coordinates. This allows the model to generate location-specific predictions, such as distinguishing between a mouth and a nose within the same object.

Neural Fields for Representing Uniform Features:
Neural fields are introduced as a means of representing uniform features across multiple pixels. An example is given of a gradient across four pixels, represented by a single set of coefficients (A and B). This representation allows for efficient reconstruction of the pixels from the coefficients.

Transformational Random Fields for Disambiguating Parts:
Transformational random fields are employed to handle ambiguities in identifying object parts. A possible mouth, for instance, can specify the pose of a nose it expects to find in relation to itself. This enables the model to disambiguate the mouth and nose by searching for the expected nose pose.

00:33:47 Transformer-Based Geometric Understanding

00:36:05 Composing High-Level Concepts from Sparse Feature Detectors

Introduction:
* Geoffrey Hinton presents Glom, an imaginary system that addresses the challenge of representing parse trees without allocating neurons dynamically to nodes.

Glom’s Functionality:
* Glom utilizes embedding vectors to represent objects or concepts.
* Each neuron in Glom acts as a basis function in an unnormalized log probability space, with its activity representing the coefficient on that basis function.
* These basis functions are vague, allowing for the representation of ambiguous objects.
* The activities of neurons are combined in the log probability space to obtain a sharper representation.

Visualizing Glom’s Performance:
* Glom’s ability to reconstruct objects from ambiguous data is demonstrated using an example of ellipses and objects like faces and sheep.

Training Glom:
* Glom can be trained in an unsupervised manner, similar to language models like BERT.
* One objective is to encourage the formation of “islands” of similar objects.
* Another objective is to train the bottom-up and top-down neural networks to agree with a consensus embedding, derived from information at the same level and other columns.
* This training process promotes the formation of islands and facilitates the sharing of knowledge between locations.

Addressing Potential Objections:
* Hinton argues that the replication of embedding vectors for identical objects is not wasteful during the search for an interpretation of an image.
* He compares the use of multiple vectors pointing to the same object to the concept of pointers in computer science.
* The replication is also less expensive due to the use of sparse long-range interactions, which only require sampling a few locations to obtain sufficient information.

Summary of Glom:
* Glom combines recent ideas in neural networks to tackle the problem of representing parse trees without dynamic allocation of neurons.
* It employs universal capsules, represented by embedding vectors, which can represent anything.
* The detailed explanation of Glom can be found in a paper available on the archive.

Additional Questions:
* The possibility of skip connections between different levels in the architecture is raised.
* Hinton suggests that the hardware levels can be thought of as a window that moves over the hierarchy in a scene.
* He provides the example of the Bohr atom to illustrate how attention can be focused on specific levels of the hierarchy.

00:44:41 Neuromorphic Networks: Using Hardware for Thinking About Consciousness

00:47:05 Self-Supervised Learning for 3D Scenes: From Object-Centric

00:49:38 Falsifying Theories of Brain Function Through Engineering

00:55:57 Unsupervised Learning Through Contrastive Representation

00:59:00 Neural Networks: Explaining Neuropsychological Phenomena

Abstract

Geoffrey Hinton’s Visionary Leap in AI: Unraveling the Intricacies of the Human Mind and Visual Perception

Geoffrey Hinton, an acclaimed AI pioneer from the University of Toronto and Google, has significantly advanced the field of artificial intelligence. His groundbreaking work includes the development of the GLOM neural network, a transformative approach aiming to closely mimic human visual processing. This system, which combines elements of transformers, unsupervised learning, and generative models, seeks to understand and replicate the hierarchical and psychological realities of human vision. Hinton’s contribution extends beyond GLOM, encompassing breakthroughs in deep neural networks, earning him the Turing Award in 2018, and his innovative work in areas such as contrastive self-supervised learning and the efficient representation of complex visual structures. His insights into mental imagery, structural parsing, and the handling of ambiguous visual information mark a significant step towards bridging the gap between human cognitive abilities and artificial intelligence.

Introduction of Geoff Hinton:

Geoff Hinton is a luminary in the field of artificial intelligence, known for his pioneering work on Boltzmann machines, variational learning, capture networks, and data visualization. He popularized the backpropagation algorithm, a fundamental technique in deep learning, and played a key role in introducing word embeddings in natural language processing. His transformative contributions to the field earned him the Turing Award in 2018, which he shared with Yann LeCun and Yoshua Bengio.

Segment Summaries and Main Ideas

Geoff Hinton’s Contributions and Current Work:

Geoffrey Hinton’s pioneering work in AI, especially his development of the GLOM framework, has significantly influenced the field’s understanding of visual perception. GLOM’s unique approach to image processing emphasizes part-whole hierarchies and coordinate frames, providing a fresh perspective on how humans perceive and interpret visual information. Hinton’s structural description of a crown, utilizing a 1979 diagram, includes objects with intrinsic coordinate frames and transforms between objects and parts, reminiscent of old-fashioned computer graphics representations of geometry.

Conceptual Framework of GLOM:

GLOM’s architecture, inspired by human vision, encompasses a hierarchical structure of embedding vectors and neural fields to represent and process images. This system mirrors the human brain’s method of visual perception, focusing on parts and wholes, and adeptly manages the complexity of visual information. Each node in a structural description can have an associated coordinate transform to the viewer, facilitating the propagation of consistent viewpoint information throughout the structural description. This allows for efficient computation of relationships between distant parts in the structure.

Understanding Human Perception with GLOM:

Hinton’s insights into the psychological reality of human vision, demonstrated through experiments like cube perception and sentence ambiguity, highlight the complexity and multidimensionality of human perception. GLOM’s design reflects these intricacies, offering a pathway to more human-like AI. Hinton argues that mental images are not pixel-based but involve viewpoint assignment, allowing for the computation of relationships between different parts of the mental image. Mental images typically orient with north being vertical. He also provides an example of a mental imagery task involving navigation and orientation, further illustrating the use of viewpoint information in mental images.

The Role of Contrastive Self-Supervised Learning:

Contrastive self-supervised learning, a method Hinton introduced and refined, is crucial in GLOM’s ability to process and interpret visual information. This technique focuses on extracting similar representations from different image patches, initially proposed in 1992 but gaining recognition in 2016. SimClear, a contrastive self-supervised learning algorithm developed in Toronto, processes two crops of an image to obtain representations, which are then dimensionally reduced and subjected to contrastive loss. This technique is effective in extracting scene representations and, when applied to tasks like ImageNet, object representations. After obtaining representations, a linear classifier with softmax can be used for classification, achieving performance comparable to supervised learning methods while only training the final layer of weights.

The Architectural Complexity and Innovations in GLOM:

GLOM’s intricate architecture, involving multiple levels of embeddings, complex coordinate transformations, and echo chambers, showcases an advanced approach to handling ambiguity and representing visual structures in a way that mimics the human brain’s processing capabilities. Glom, inspired partly by biology, aims to achieve spatial coherence of familiar shapes through attention and to represent objects with identical vectors at the object level. This neural network architecture replicates weights, similar to knowledge replication in cells, and uses hardware equivalents like a column of hardware representing a small patch of an image.

GLOM’s Implementation and Future Potential:

The implementation of GLOM, utilizing techniques like unsupervised learning and knowledge transfer, demonstrates its versatility in AI, capable of addressing complex visual tasks and advancing our understanding of both artificial and human intelligence. Vision is conceptualized as a sampling process, where the focus is on detailed processing of specific areas.

Hinton’s Perspective on AI and Neuroscience:

Hinton’s work, while primarily focused on artificial intelligence, also provides significant insights into neuroscience and cognitive science. He advocates the use of neural networks in scientific research and validates brain models through computational approaches, creating a unique intersection between AI and human cognition.

Geoffrey Hinton’s development of the GLOM neural network exemplifies the potential of AI in mimicking and understanding human cognitive processes. By bridging AI, visual perception, and cognitive science, his contributions deepen our understanding of both artificial and human intelligence, opening new paths for exploration and innovation. His work not only enhances our comprehension of the complex nature of the human mind but also paves the way for more advanced and human-like artificial intelligence systems.

Update:

Hinton’s recent work further explores GLOM’s architecture and functionality, revealing a three-level embedding system within each column of the hierarchical neural fields model. He emphasizes bottom-up, top-down, and local interactions in shaping these embeddings and introduces transformational random fields for disambiguating parts. This enables the model to identify objects relative to each other and proposes an iterative network using log probabilities to combine distributions and select the most likely interpretation of ambiguous parts.

Supplemental Update:

Hinton’s unsupervised learning system learns a hierarchical representation of images without relying on labeled data, using partial autoencoders and contrastive learning. He suggests studying brain injuries to validate brain models and explores neural networks with attractors to model semantic meaning in language. These developments continue to bridge the gap between AI and neuroscience, providing valuable insights into the human mind.

Notes by: ZeusZettabyte

Geoffrey Hinton (Google Scientific Advisor) – How to represent part-whole hierarchies in a neural net (Jul 2021)

Chapters

Abstract

Related posts: