Geoffrey Hinton (Google) (Dec 2016)

Geoffrey Hinton (Google Scientific Advisor) – Neural Networks for Machine Learning by Geoffrey Hinton (Lecture 1/16 (Dec 2016)

Chapters

00:00:29 Machine Learning: From Basic Concepts to Advanced Examples

00:10:55 Deep Neural Networks Revolutionizing Speech Recognition

00:13:21 How Neurons Work

00:17:55 Neural Networks: Learning and Flexibility in the Brain

Neurons and Communication:
The brain consists of neurons that receive inputs from other neurons or receptors. These neurons communicate with each other by sending spikes of activity. Synaptic weights control the effect of an input line on a neuron and can be positive or negative.

Weight Adaptation and Learning:
By adapting synaptic weights, the network learns to perform various tasks like object recognition, language comprehension, planning, and body movement control.

Storage Capacity:
The brain has approximately 10^11 neurons, each with about 10^4 weights, resulting in 10^15 or 10^14 synaptic weights. A significant portion of these weights can affect the ongoing computation in a few milliseconds, providing immense bandwidth to stored knowledge.

Modularity and Functional Localization:
The cortex is modular, meaning different parts of it learn to perform different functions. Inputs from senses go to specific parts of the cortex, influencing their functionality. Local damage to the brain affects specific functions, such as language comprehension or object recognition.

Brain Scanning and Function Location:
Brain scanners can visualize blood flow, indicating which parts of the brain are active during specific tasks. The cortex appears largely similar throughout, suggesting a universal learning algorithm.

Functional Relocation and Adaptability:
Early brain damage leads to functional relocation to other parts of the brain, demonstrating flexibility in function assignment. Experiments with baby ferrets show that rerouting sensory inputs can cause the brain to adapt and learn new functions.

General-Purpose Cortex and Flexibility:
The cortex can transform into special-purpose hardware for specific tasks based on experience. This combination of rapid parallel computation and flexibility allows for learning new functions.

Comparison with Conventional Computers:
Conventional computers use stored sequential programs for flexibility, requiring fast central processes. The brain’s flexibility comes from adapting synaptic weights, enabling parallel computation and efficient learning.

00:22:00 Neurons in Neural Networks: Linear, Binary Threshold, and Rectified Linear

00:27:19 Neurons and Learning Algorithms in Artificial Neural Networks

Logistic Neurons:
Sigmoid neurons are commonly used in artificial neural networks. They produce a real-valued output that is a smooth and bounded function of their total input. The logistic function is used to compute the output, which ranges from 0 to 1. The derivatives of the logistic function are smooth and continuous, making it suitable for learning algorithms.

Stochastic Binary Neurons:
Stochastic binary neurons use the same equations as logistic units. They compute the total input and use the logistic function to calculate a probability of producing a spike. Instead of outputting the probability, they make a probabilistic decision and output a 1 or 0. They introduce intrinsic randomness, where the timing of spike production follows a Poisson process.

Rectified Linear Units (ReLUs):
ReLUs use a different activation function, where the output is the input if it’s positive, and 0 otherwise. Similar to stochastic binary neurons, ReLUs can be used to introduce randomness in the output by determining the rate of spike production.

Example of Machine Learning:
A simple neural network is trained to recognize handwritten shapes. The network has two layers: input neurons representing pixel intensities and output neurons representing classes. Each pixel “votes” for multiple shapes, and the shape with the most votes is recognized.

Weight Display and Learning Algorithm:
Weights between input and output units are displayed in a map, with each output unit having its own map. The strength of each connection is represented by a black and white blob, with the area indicating magnitude and the color indicating sign. A learning algorithm is used to adjust the weights based on training examples. The algorithm increments weights from active pixels to the correct class and decrements weights from active pixels to the class guessed by the network. This prevents weights from growing too large and encourages the network to learn the correct patterns.

Results:
After showing the network several hundred training examples, the weights converge to patterns that resemble templates for each shape. The network effectively learns to recognize handwritten shapes.

00:34:21 Types of Machine Learning: Supervised, Reinforcement, and Unsupervised

Challenges of Simple Neural Networks for Handwritten Digit Recognition:
Geoffrey Hinton discusses the limitations of a simple neural network in recognizing handwritten digits. The network struggles to capture variations in handwritten digits due to its inability to model allowable variations by extracting features and examining arrangements. Templates for whole shapes are insufficient for this task, as demonstrated by the examples shown.

Three Types of Machine Learning:
Hinton introduces three broad groups of machine learning algorithms: supervised learning, reinforcement learning, and unsupervised learning. Supervised learning aims to predict an output given an input vector, with the goal of minimizing the discrepancy between the target output and the actual output. Reinforcement learning focuses on selecting actions or sequences of actions to maximize rewards, which may occur occasionally. Unsupervised learning seeks to discover a good internal representation of the input data.

Supervised Learning:
Supervised learning is further divided into two categories: regression and classification. Regression involves predicting a real number or a vector of real numbers as the target output, aiming to get as close as possible to the correct value. Classification involves predicting a category or label from a set of alternatives, with the simplest case being a binary classification between positive and negative.

Model Class and Parameter Adjustment:
In supervised learning, a model class is selected, which defines the mapping from an input vector to an output using numerical parameters. The goal is to adjust these parameters to make the mapping fit the supervised training data. The discrepancy between the target output and the actual output is minimized using various measures, with the squared difference being a common choice.

Reinforcement Learning:
Reinforcement learning involves selecting actions or sequences of actions based on occasional rewards. The goal is to maximize the expected sum of future rewards, often using a discount factor to prioritize immediate rewards. Reinforcement learning is challenging due to delayed rewards, limited information from sparse rewards, and the inability to learn millions of parameters.

Unsupervised Learning:
Unsupervised learning aims to discover a good internal representation of the input data. For a long time, the machine learning community largely ignored unsupervised learning, except for clustering, which is a limited form of unsupervised learning.

00:40:12 Unsupervised Learning: Goals and Techniques

Abstract

The Evolution and Applications of Machine Learning: From Patterns to Brain-like Computation

—

Abstract:

Machine learning, a branch of computer science that enables computers to learn without explicit programming, has significantly impacted various domains. Its proficiency lies in identifying patterns and irregularities in data, which is crucial in image and sensor pattern recognition, predictive analysis, and anomaly detection.

—

Machine learning algorithms are broadly categorized into supervised and unsupervised learning. Supervised learning algorithms use labeled data to learn a function mapping input data to output labels, while unsupervised learning algorithms uncover structures and patterns in unlabeled data. A classic example of supervised learning is the MNIST database of 70,000 grayscale images of handwritten digits, serving as a benchmark for machine learning performance.

Neural networks, drawing inspiration from the human brain, have advanced object recognition capabilities, successfully identifying various object classes under different conditions. In speech recognition, deep neural networks have improved the accuracy of systems by excelling in acoustic modeling. A notable achievement in this field is Darla Mohammed’s system, which achieved a 20.7% error rate, surpassing the previous best of 24.4%.

The shift from artificial neural networks to biological neurons reveals a different approach to information processing. Neurons, the brain’s fundamental units, are vastly different from traditional serial processors, and synapses, the junctions between neurons, are key to learning and complex computations. These synapses offer compactness, efficiency, and adaptability but understanding their role in learning is a continuing challenge.

The human brain’s computational power is immense, with its network of neurons and synaptic weights facilitating high-bandwidth knowledge processing. The cortex is particularly flexible, capable of adapting to new sensory inputs and relocating functions in case of damage. This contrasts with conventional computers, which depend on fast central processors and stored programs for flexibility. The brain’s efficiency lies in its parallel computation and flexibility through learned synaptic weights.

Machine learning encompasses various types, including supervised, reinforcement, and unsupervised learning, each addressing unique challenges. Unsupervised learning, for instance, aims to transform high-dimensional inputs into more economical codes. The field seeks to develop internal representations that facilitate learning, moving beyond traditional clustering.

The article concludes by emphasizing the evolution of machine learning from simple pattern recognition to emulating brain-like computation. Despite the use of simplified neuron models for foundational understanding, ongoing research continues to reveal the complexities and capabilities of this expanding field.

Neurons and Communication:

Neurons in the brain receive inputs from other neurons or receptors and communicate through spikes of activity. Synaptic weights, which can be positive or negative, control the impact of these inputs.

Weight Adaptation and Learning:

By adjusting synaptic weights, the network learns to perform tasks like object recognition, language comprehension, and body movement control.

Storage Capacity:

The brain’s approximately 10^11 neurons, each with about 10^4 weights, result in a vast number of synaptic weights, enabling immense bandwidth for stored knowledge.

Modularity and Functional Localization:

The cortex’s modularity means different parts learn different functions. Inputs from the senses influence the functionality of specific cortex areas. Local brain damage affects particular functions, such as language comprehension or object recognition.

Brain Scanning and Function Location:

Brain scanners reveal active brain areas during specific tasks through blood flow visualization. The cortex’s uniform appearance suggests a universal learning algorithm.

Functional Relocation and Adaptability:

Early brain damage can lead to functional relocation, showing the brain’s adaptability in function assignment. Experiments with baby ferrets demonstrate this adaptability through the learning of new functions when sensory inputs are rerouted.

General-Purpose Cortex and Flexibility:

The cortex transforms into specialized hardware for specific tasks based on experience, combining rapid parallel computation and flexibility for new function learning.

Comparison with Conventional Computers:

In contrast to conventional computers that rely on stored sequential programs, the brain’s flexibility stems from adapting synaptic weights, facilitating parallel computation and efficient learning.

Understanding Neural Idealization

Idealization simplifies complex systems like the brain, making them manageable for study. This approach allows the application of mathematics and familiar system analogies, with the understanding that models, while imperfect, can be valuable.

Linear Neurons

Linear neurons produce outputs based on a function of bias, weighted activities from input lines, and synaptic weights. The input-output relationship is represented graphically as a straight line.

Binary Threshold Neurons

McCulloch and Pitts introduced binary threshold neurons, influencing the design of universal computers. These neurons compute a weighted sum of inputs, triggering a spike of activity if a threshold is exceeded. They initially represented the brain as a logic-based system, but current understanding views it as a combiner of various unreliable evidence sources.

Rectified Linear Neurons (ReLUs)

ReLUs combine linear and binary threshold neurons’ features. They compute a linear weighted sum of inputs and apply a non-linear function to the result. Their input-output curve is linear above zero and zero otherwise, incorporating linear system properties while allowing for decisive outputs.

Logistic Neurons

Logistic neurons, commonly used in artificial neural networks, output real values as a smooth function of their total input. The logistic function, ranging from 0 to 1, is used for output computation, with its derivatives being smooth and continuous, aiding learning algorithms.

Stochastic Binary Neurons

These neurons use logistic functions to calculate the probability of producing a spike, outputting a binary result based on probabilistic decisions. They introduce randomness in spike production timing, akin to a Poisson process.

Rectified Linear Units (ReLUs)

Different from the previously mentioned ReLUs, these units activate only when the input is positive, introducing randomness in output by determining spike production rates.

Example of Machine Learning:

A simple neural network trained to recognize handwritten shapes has input neurons representing pixel intensities and output neurons for shape classes. Pixels vote for shapes, with the most votes determining recognition.

Weight Display and Learning Algorithm:

The network’s weights are visualized in a map, with each output unit’s connections represented by black and white blobs indicating magnitude and sign. A learning algorithm adjusts weights based on training examples, incrementing weights for active pixels to the correct class and decrementing for the guessed class.

Results:

After training with several hundred examples, the network’s weights converge to templates resembling each shape, effectively learning shape recognition.

Overview of Supervised, Reinforcement, and Unsupervised Learning

Challenges of Simple Neural Networks for Handwritten Digit Recognition:

Geoffrey Hinton highlights the limitations of basic neural networks in recognizing handwritten digits. These networks struggle with capturing the variations in handwritten digits, failing to model allowable variations through feature extraction and arrangement analysis. The inadequacy of whole shape templates for this task is evident in presented examples.

Three Types of Machine Learning:

Hinton outlines three main groups of machine learning algorithms: supervised, reinforcement, and unsupervised learning. Supervised learning predicts outputs from input vectors, aiming to minimize target and actual output discrepancies. Reinforcement learning selects actions to maximize rewards, facing challenges from delayed and sparse rewards. Unsupervised learning discovers efficient internal representations of input data.

Supervised Learning:

Supervised learning is categorized into regression, predicting real numbers or vectors, and classification, predicting categories from alternatives. The goal is to adjust model parameters to fit the training data, commonly using the squared difference to minimize output discrepancies.

Reinforcement Learning:

This learning type involves action selection based on rewards, priorit izing immediate rewards using discount factors. It’s challenging due to delayed rewards and the difficulty of learning from sparse information.

Unsupervised Learning:

Unsupervised learning’s goal is to discover efficient representations of inputs, moving from high-dimensional representations like images to more manageable forms. This includes finding manifolds in high-dimensional spaces and representing inputs in terms of learned features for economical representation.

Definition of Unsupervised Learning:

Traditionally seen as clustering, unsupervised learning aims to create internal representations useful for subsequent learning, going beyond mere clustering.

Two-Stage Learning:

This approach separates unsupervised and supervised/reinforcement learning to avoid using rewards/punishments for setting parameters in visual systems, exemplified by learning the distance to a surface without relying on constant feedback.

Goals of Unsupervised Learning:

The objective is to compress high-dimensional inputs into lower-dimensional representations. This involves finding manifolds in high-dimensional spaces and using features like binary or sparse real-valued features for representation.

Machine learning’s evolution from simple pattern recognition to brain-like computation underscores its vast potential in various applications. The field continues to grow, delving deeper into the complexities and capabilities of both artificial and biological computational models.

Notes by: datagram

Geoffrey Hinton (Google Scientific Advisor) – Neural Networks for Machine Learning by Geoffrey Hinton (Lecture 1/16 (Dec 2016)

Chapters

Abstract

Related posts: