Peter Norvig (Google Director of Research) – Deep Learning and Understandability vs Software Engineering and Verification (Mar 2016)


Chapters

00:00:06 Bridging the Gap Between Machine Learning Experts and Software Engineers
00:06:11 Machine Learning as an Agile Tool in Software Engineering
00:10:19 Agile Software Engineering with Machine Learning
00:14:36 Learning Complete Programs from Examples
00:21:10 Learning Entire Programs: Challenges and Approaches
00:29:55 Understanding User Intent in Formal Programming Languages
00:32:37 Compilers and Interpreters: Learning from User Errors
00:35:56 Machine Learning as an Agile Software Tool
00:39:00 Challenges in Machine Learning
00:42:24 Challenges of Machine Learning in Software Engineering
00:53:20 Deep Learning Tools and Techniques

Abstract

Unraveling the Evolution of Software Engineering: Bridging Machine Learning and Traditional Development

Introduction

In a rapidly evolving tech landscape, the paradigm of software engineering is shifting. Sponsored by Wipro, a leader in global technology services, Peter Norvig, Google’s Director of Research, recently shed light on this transformation. He emphasized the burgeoning relationship between machine learning and traditional software engineering, suggesting a fusion of these domains. This article delves into the changing nature of software engineering, the rise of deep learning, and the challenges and opportunities that lie at the intersection of machine learning and traditional software practices.

The Evolution of Software Engineering

Software engineering, traditionally viewed as a mathematical science, is undergoing a radical transformation. Its essence, once rooted in the rigorous coding and validation by individuals or teams, is now morphing into a more natural science-like discipline. Developers increasingly rely on pre-written code from repositories like GitHub, forming hypotheses about their behavior, much like scientists. This shift, as pointed out by Norvig, reflects a growing trend where software development is less about proving correctness and more about empirical validation and iterative refinement.

The Convergence of Machine Learning and Software Development

The intersection of machine learning with software development is redefining the landscape. Software, essentially a function meeting specified needs, is now increasingly intertwined with machine learning, which seeks to approximate functions based on input-output examples. This approach, exemplified in deep learning’s multiple abstraction levels, has gained significant traction, particularly highlighted by the global attention on the Lee Sedol-AlphaGo match in South Korea.

In this new era, a data scientist’s role transcends traditional boundaries, encompassing the formulation of ideas, data interpretation, and algorithmic feedback. Machine learning has seeped into diverse aspects of software engineering, from spell correction using Naive Bayes algorithms to the sophisticated neural networks driving machine translation and game-playing strategies in systems like AlphaGo.

Learning from Examples: A Paradigm Shift

A pivotal aspect of this evolution is the concept of learning programs from examples. The journey, beginning in the 1980s with inductive logic and genetic programming, has now reached a stage where probabilistic representations and deep learning significantly enhance program induction. The focus has shifted towards synthesizing programs in domain-specific languages, with a more recent trend being the generation of parameters for existing program structures. Yann LeCun’s analogy beautifully encapsulates this: unsupervised learning is the substantial cake base, supervised learning the icing, and reinforcement learning the cherry on top.

Challenges in Machine Learning for Software Engineering:

– Data selection and management: Determining how much past data to use, handling configuration dependencies, and ensuring consistency between training and production data.

– Tooling and infrastructure: Lack of established tools and frameworks for data management and analysis in machine learning, leading to increased chances of errors.

– Privacy and security: Ensuring the protection of sensitive data used in machine learning systems.

– Wide vs. narrow interfaces: Balancing the need for minimal data transfer between modules with the risk of insufficient information for decision-making.

– System complexity: Difficulty in combining different components and modules effectively, leading to potential failures despite individual components functioning correctly.

Combining Machine Learning and Traditional Software Engineering:

– Integrating machine learning and software engineering approaches: Exploring ways to combine machine learning models with traditional software engineering techniques to create robust and maintainable systems.

– Continuous learning and adaptation: Developing systems that can learn and adapt over time, incorporating new data and knowledge to improve performance.

Researchers have long explored the concept of program induction, aiming to generate programs from examples. Initial efforts focused on simple examples, such as sorting or reversing a list. Inductive logic programming, genetic programming, and logical and functional languages were among the techniques used. Despite progress, these methods faced limitations in handling larger and more complex programs.

Increased computing power and improved representations led to advancements in program induction in the 2000s. The focus shifted from logic-based to probability-based approaches, enabling the identification of the most likely programs. Deep learning techniques, LSTMs, and complex intermediate representations further enhanced the capabilities of program induction.

Tricks and techniques for efficient program induction include utilizing languages with a single way to express operations, employing stronger type systems and total functional programming to eliminate the halting problem, and using domain-specific languages that are more amenable to program induction compared to general-purpose languages.

Alternative approaches to program generation involve generating parameters for existing program structures, using deep neural networks to replace complex translation systems, and having deep learning algorithms learn to play Atari games by receiving game experience as input.

However, challenges and limitations remain. Program induction is challenging for complex programs, and the boundaries of deep learning approaches’ success and failure are not fully understood. Long-range planning and strategic thinking pose difficulties for deep learning algorithms.

The Challenge of Learning Entire Programs

Despite these advances, learning entire programs remains an arduous task, primarily due to the traditional non-differentiable nature of programs. Neural Turing machines present a promising direction by enabling differentiation and optimization through gradient descent. However, their effectiveness is still limited, evident in the complexity of tasks like the alignment problem in machine translation, which requires extensive code for relatively straightforward mathematical formulas.

Improving Human-Computer Interaction and Understanding

Wolfram Alpha represents a significant stride in this direction, blending natural and formal languages to interpret user queries. This approach, focusing on user intent, is a stark contrast to the rigid formal languages typically used in programming. The recommendation is clear: programming languages should evolve to better understand user intent, akin to Wolfram Alpha. This evolution extends to compilers, which can benefit from a database of user interactions, helping them provide more context-aware error messages and suggestions.

The Role of Machine Learning in Agile Software Development

Machine learning, now a potent tool in agile software development, offers rapid iteration and improvement capabilities. However, this comes with a caveat – the risk of accumulating technical debt. The dynamic nature of machine learning systems means that changes in one part can have far-reaching effects, necessitating a continuous adaptation and update cycle. This phenomenon, coupled with data-related challenges like handling non-stationary data and the intricacies of tooling, underscores the need for a cautious approach in integrating machine learning into software development.

AlphaGo: A Case Study in Deep Learning Success

AlphaGo’s Success in Go:

– Leveraging self-play: AlphaGo’s ability to play against itself and generate meaningful data for training, enabling it to surpass the performance of existing Go programs.

– Developing an evaluation function: AlphaGo’s ability to evaluate board positions effectively allowed it to make informed decisions and focus on promising moves, leading to its superior performance.

AlphaGo’s triumph in the field of Go is a testament to the efficacy of deep learning in complex decision-making scenarios. By harmonizing expert data, self-play, and novel evaluation functions, AlphaGo demonstrated a level of proficiency that surpassed existing Go programs. This success story also underscores the challenges in debugging and understanding machine learning systems, highlighting the need for better visualization and analysis tools.

Machine Learning for User-Friendly Error Messages

Machine learning can be employed to create user-friendly error messages by analyzing user interactions and identifying common misconceptions. Compilers and interpreters should provide more helpful error messages that explain the underlying issue and suggest potential solutions. By analyzing user interactions and identifying common misconceptions, compilers can offer more informative error messages.

Human-Machine Partnership for Efficient Program Assessment

Machine learning has proven effective in assisting humans in handling large volumes of work. For instance, Chris Peach, Peter Norvig’s student, developed a method to summarize and group submissions to Andrew Ng’s online classes. This method reduced the workload significantly by identifying similar programs and providing feedback efficiently.

Technical Debt in Machine Learning

Machine learning can introduce technical debt, leading to future maintenance challenges. It is crucial to understand when and how to incur technical debt wisely. While machine learning enables rapid development, it requires careful consideration of long-term implications.

Challenges in Machine Learning

Deep Learning and Go:

– Lee’s victory over the Go champion was a significant moment, but AI’s rapid progress suggests that human dominance in Go, like in chess, will be short-lived.

Fundamental Tools for Deep Learning:

– Deep Dream and image processing tools provide insights into deep networks but are limited to two-dimensional data.

– High-dimensional data, common in deep learning, poses challenges for visualization and interpretation.

Experimental Frameworks:

– Machine learning systems require better experimental frameworks to assess performance and progress.

– Persistent build systems, common in traditional software development, can be applied to machine learning systems for continuous testing and validation.

Machine learning systems often lack clear abstraction barriers, making it difficult to predict outcomes. Changes in one part can have cascading effects, requiring careful consideration. Additionally, machine learning systems can be prone to failure if they are misused or applied in unsuitable contexts. The changing nature of data over time, known as the non-stationary effect, poses further challenges in maintaining the accuracy of machine learning systems.

The Future of Machine Learning in Software Engineering

Transition from Supervised to Unsupervised Learning:

– Unsupervised learning lacks an explicit cost function, making it challenging to evaluate progress.

Meta-learning and Data Structures:

– Meta-learning aims to optimize data structures for representing knowledge and improving transfer learning across tasks.

– Identifying shared knowledge and best practices for building global models is an ongoing research area.

The landscape of software engineering is undeniably shifting towards a more integrated approach with machine learning. This integration promises to revolutionize traditional practices, offering new dimensions of efficiency and capability. However, it also brings to the fore significant challenges – from the need for better tools and practices to the handling of technical debt and data issues. As we stand at this crossroads, the future of software engineering appears both exciting and daunting, filled with opportunities for innovation and areas demanding careful navigation.


Notes by: ZeusZettabyte