Hal Varian (Google Chief Economist) – Google and Social Science Research (Feb 2015)


Chapters

00:00:00 Google Data Tools for Social Science Research
00:03:12 Predicting House Sales with Google Trends Data
00:08:41 Predicting Housing Starts Using Google Search Data
00:12:58 Assessing Predictive Queries Using Google Correlate
00:22:44 Predicting Trends and Economic Phenomena with Google Trends
00:26:30 Google's Tools for Economic Research
00:35:11 Google's Survey Amplification Technique

Abstract

Harnessing the Power of Google Data for Predictive Modeling and Economic Forecasting

Abstract:

This comprehensive article delves into the utilization of Google tools such as Google Correlate, Google Trends, and Google Consumer Surveys for predictive modeling and economic forecasting. It highlights how these tools empower researchers to ask compelling questions and derive insights from Google data. The article also underscores the transformative potential of unconventional data sources in improving our understanding of complex economic and social phenomena, as evidenced by the work of economists like Hal Varian. Furthermore, it examines the implications of these tools in various domains, including their use in now-casting economic activity, political campaigns, and product marketing targeting.

Google Data Tools for Social Science Research

Hal Varian’s presentation on Google data tools for social science research emphasizes three tools: Google Correlate, Google Trends, and Google Consumer Surveys. Google Correlate shows queries most correlated with state-wise or weekly/monthly time series data entered by the user. Google Trends displays an index of query activity for specific queries or query categories uploaded by the user. Google Consumer Surveys facilitate inexpensive and effective surveys of internet users.

Google Correlate stands out as a pivotal tool for predictive modeling. It identifies search terms correlated with a given time series, such as house sales, and helps build predictive models using these search queries. Steve Scott’s model, which decomposes a time series into trend, seasonal, and regression components, employs Bayesian variable selection and model averaging to enhance predictive accuracy. Ensembles of models rather than single models are shown to offer superior predictive performance. This approach was particularly effective in identifying significant predictors of housing starts, like the appreciation rate and IRS 1031 tax form, achieving a remarkable in-sample fit and improving the mean absolute error in forecasts by 23%.

The Critical Role of Spurious Correlations:

Researchers must be wary of spurious correlations when using Google Correlate. Not all significant correlations imply meaningful or causal relationships. It requires judicious judgment to identify search terms that are truly predictive and relevant. Factors like one-time events or unrelated trends can significantly influence correlations, underscoring the need for critical analysis.

Economic Forecasting with Unconventional Data Sources:

Hal Varian’s innovative approach to economic forecasting using data from sources like MasterCard and Google Correlate highlights the potential of harnessing unconventional data. Varian demonstrated the efficacy of these tools in predicting economic trends, such as identifying queries related to the housing market as strong predictors of state-level price declines during the recession. His work emphasizes the importance of incorporating both quantitative and qualitative factors in model building, using Bayesian methods to integrate expert judgment. This approach extends beyond economics, as seen in his research predicting lifespans based on Google search queries.

Google Trends in Analysis and Forecasting:

Google Trends is another powerful tool, offering insights into user behavior patterns. For instance, the correlation between queries for hangover and martini recipes suggests a lag relationship, though causality remains unclear. In unemployment analysis, Google Trends data has shown a remarkable fit with initial claims for unemployment benefits, indicating its utility in forecasting unemployment.

Now-casting Economic Activity

In times of crisis, timely access to economic data is crucial. Traditional data collection methods, such as GDP compilation, can be slow and subject to revisions. Alternative data sources, such as MasterCard transaction data, offer near real-time insights into economic activity. These datasets can be used for “now-casting,” providing a more up-to-date picture of the economy.

Correlating Survey Data with Google Trends:

Combining Google Consumer Surveys with Google Trends led to the development of “survey amplification.” The technique involves associating each survey response with the respondent’s city and building a predictive model using Google Trends categories.

Predictive Models for Survey Responses:

Predictive models are constructed using survey responses and characteristics of the geographic area from where the survey originated. These models utilize demographic predictors, such as age, income, and education, as well as query data from the geographic area.

Google Consumer Surveys for Insightful Data Gathering:

Google Consumer Surveys provide a cost-effective and quick method for conducting surveys. They are particularly useful for testing question wording and gathering insights on voting patterns. The integration of Google Consumer Surveys with Google Trends to predict survey responses based on geographic trends represents an innovative approach to survey amplification and product marketing targeting.



The article concludes by emphasizing the accessibility and encouragement provided by Google tools for exploration and experimentation. These tools offer invaluable insights and have the potential to revolutionize data analysis and predictive modeling across various domains. Their use in now-casting economic activity, particularly during periods like the Obama administration’s economic stimulus package, highlights the importance of timely economic indicators. The work of scholars like Steve Scott and Hal Varian showcases the transformative potential of these unconventional data sources and innovative analytical techniques.

This article, by presenting an in-depth analysis of the capabilities and applications of Google’s data analysis tools, offers valuable insights for researchers, economists, and data scientists. It not only elucidates the technical aspects of these tools but also their practical implications in real-world scenarios, making it a significant contribution to the field of data-driven decision-making.


Notes by: MatrixKarma