Hal Varian (Google Chief Economist) – Google Tools for Data (Sep 2016)
Chapters
Abstract
Harnessing Google’s Digital Powerhouse: A Revolution in Economics and Forecasting (Updated)
In the field of economics and data analysis, the integration of Google’s innovative tools like Google Trends, Google Correlate, and Google Consumer Surveys has marked a significant shift in research methodologies and forecasting. These tools, first introduced by economist Hal Varian, offer unprecedented insights into human behavior, economic activity, and even happiness levels across cities. In this comprehensive analysis, we explore the transformative impact of these tools on economic research and forecasting, examining their application in predicting crucial events like the Grexit and the unique ways they’re reshaping the landscape of data interpretation and demographic analysis.
Watynski Lectureship Award and Distinguished Presenters
The Watynski Lectureship Award, established in 1963 at the University of Michigan, honors exceptional contributions to the field of economics. Vladimir Watynski, a Russian economist who significantly influenced the development of the US social security system, is the namesake of the award. Distinguished presenters have included Nobel Prize winners like Gary Becker, Robert Barrow, Tom Sargent, Peter Diamond, Zeve Grylikas, Jim Heckman, Claudia Golden, Angus Deaton, Bob Hall, and Susan Athe.
Google Tools for Data Analysis and Economic Prediction
Google Trends allows users to explore search volume for specific terms over time, providing insights into public interest and behavior. For instance, the peaks in searches for “hangover” on Sundays and New Year’s Day reveal societal patterns. Google Correlate goes a step further by finding related search terms, as seen in the correlation between “vodka” and “hangover” searches, offering a window into consumer behavior. Google Consumer Surveys, meanwhile, enable quick, targeted surveys within search results, as demonstrated by their predictive power during the Greek referendum on the Grexit.
Advanced Predictive Models Using Google Data
These tools have been instrumental in developing advanced predictive models. For example, they have been used to forecast unemployment rates, with the timing of queries offering real-time signals that can precede official data. The introduction of methods like spike and slab regression and adaptive non-parametric methods have allowed for more accurate economic forecasts by automating the predictor selection process and improving the accuracy of models.
Insights into Happiness and Life Expectancy
Google’s tools have also been used to identify patterns in societal well-being and health. Charlottesville, Virginia, was identified as the happiest city, and searches related to blood pressure medication were found to be strong predictors of shorter life expectancy. These findings highlight the potential of Google tools in public health and societal well-being research.
Challenges and Opportunities in Economic Forecasting
While Google Trends and other private sector data offer detailed, high-frequency insights, they also present challenges, such as proprietary restrictions and the need to combine them with traditional, low-frequency government data for comprehensive analysis. Nevertheless, the combination of these data sources represents a promising area of research for economic forecasting.
Data Access and Quality Considerations
The access to a panel of mobile phone users for surveys, along with the application of privacy filters in Google Trends, underscores the balance between data accessibility and privacy. However, shifting from traditional surveys like Pew Research to Google Consumer Survey doesn’t necessarily imply superior data quality but offers a complementary perspective on the underlying reality.
Embracing the Digital Age in Economics
The integration of Google’s digital tools into economic research and forecasting signifies a pivotal shift towards embracing the digital age. While challenges remain, such as data representation and quality, these tools provide a new lens through which to understand and predict economic and social phenomena, marking a new era in the field of economics.
Supplemental Information
The Watynski Lectureship: Honoring Contributions to Economics
The Watynski Lectureship Award, established in 1963 at the University of Michigan, honors exceptional contributions to the field of economics. Vladimir Watynski, a Russian economist who significantly influenced the development of the US social security system, is the namesake of the award. Distinguished presenters have included Nobel Prize winners like Gary Becker, Robert Barrow, Tom Sargent, Peter Diamond, Zeve Grylikas, Jim Heckman, Claudia Golden, Angus Deaton, Bob Hall, and Susan Athe.
Google Tools for Data: Insights from Hal Varian’s Lecture
Google Trends, Google Correlate, and Google Consumer Survey are powerful tools for data analysis introduced by Hal Varian. Google Trends allows users to explore search trends for terms over time, revealing insights into popular search terms, regional interests, and seasonal patterns. Google Correlate identifies terms strongly correlated with a given search term, providing insights into user behavior and trends. Google Consumer Survey enables businesses to conduct surveys with targeted audiences, providing insights into consumer preferences and behaviors.
Google Correlate: Uncovering Insights from Search Trends
Google Correlate leverages search data to identify correlated terms, revealing insights into user behavior and trends. By inputting terms, time series, or cross-sectional data, users can discover correlated terms. Surprisingly, “weight loss” is strongly correlated with “best vacation spots,” reflecting the tendency to consider weight loss when planning vacations. Search queries can also serve as predictors of economic activity, such as unemployment trends and home sales intentions.
Serious Applications of Google Search Data
Beyond fun and interesting insights, Google search data has serious applications in various fields. It can track unemployment trends and home sales intentions, providing real-time insights into current economic activity. Search queries can also be used to predict economic activity, with unemployment queries serving as indicators of current unemployment trends, even before official data releases.
Forecasting with Query Data
Query Timing and Metric Relationship: It’s crucial to understand the relationship between the timing of a query and the measured activity, expenditure, or metric. Different metrics have distinct relationships with query timing.
Example: Initial Claims for Unemployment Benefits: Initial claims for unemployment benefits peak at the end of every recession and tend to peak about six months before the actual unemployment rate, making them valuable for forecasting unemployment rates.
Google Correlate for Identifying Predictive Queries: Google Correlate can find queries correlated with initial claims, such as queries related to unemployment in different states, serving as strong indicators of future initial claims.
Predictive Strength of Query Data: Adding query data to simple autoregressive models significantly improves prediction accuracy. Similar improvements are observed with more sophisticated ARIMA models.
General Applicability: Query data is useful for forecasting a variety of metrics in many cases. The example of initial claims for unemployment benefits is one of many potential applications.
Advanced Forecasting Techniques Using Google Correlate: Beyond Simple Correlation
Spike-and-Slab Regression: A Bayesian method involving a probability that a variable is included. It allows for flexibility and adaptability in model selection.
Kalman Filter Integration: Kalman filter captures trend and seasonality in time series data, contributing to the effectiveness of spike-and-slab regression for model selection.
Component-by-Component Analysis: Adaptive non-parametric methods facilitate the analysis of individual components, including trend, seasonality, and regression components, identifying and explaining spurious correlations.
Automating Predictor Selection: Google Correlate helps identify the most predictive trends for a given time series, automating the process of selecting the best regressors.
Understanding Relationships Between Queries, Demographics, and Societal Trends with Google’s Tools
Google Trends and Correlate: Provide insights into topics like unhappy cities, life expectancy predictors, and geographical variations in preferences. Negative life expectancy predictors include queries on blood pressure medication, Obama-related topics, and gun-related terms.
Google Consumer Surveys: A commercial product for conducting simple, quick, and inexpensive surveys. Cost-effective alternative to traditional surveys, with results delivered within hours. Useful for marketing, product development, and understanding consumer preferences.
Geoamplification: Combines survey results with query patterns to amplify and extrapolate insights, identifying geographical areas where certain messages or products resonate strongly based on search behavior.
Hard Places Index: An index developed by the New York Times to measure the difficulty of living in different parts of the U.S. Google Correlate can identify queries associated with hard places, such as those related to social security disability, health concerns, and religious topics.
Key Points from Hal Varian’s Presentation on Using Data for Economic Analysis
High-Frequency Data in the Private Sector: Private companies such as Google, UPS, Visa, Walmart, and FedEx possess real-time data systems that offer insights into current business activities.
Complementary Nature of Private and Government Data: Private sector data offers high frequency and detailed information, while government data provides carefully constructed long-term trends. Combining these two types of data can enhance economic models and forecasting.
Opportunities for Research and Thesis Topics: Challenges lie in combining private and government data sets to create superior economic models for both current and long-term trends. Hal Varian encourages senior theses based on this data integration.
Example of Google Search Data: Google Trends reveals a significant increase in searches for “withdrawal penalty” during the Great Recession.
Historical Thought Analysis using Google Ngrams: Google Ngrams enables researchers to search phrases across time using the Google library project corpus. This tool is valuable for analyzing historical trends and the popularity of topics.
Challenges in High-Frequency Trading: While Google provides hourly frequency data, many market activities occur on millisecond frequencies. Despite this limitation, several institutions use Google queries to analyze events such as movie openings and earnings reports.
Personal Identifiable Data in Google Searches: Google collects information such as IP address, time, and URL when users conduct searches. Personally identifiable information can be linked to a user through login status, web history, and ad preferences.
Hal Varian on Data Availability and Google Consumer Surveys
Microdata Access for Researchers: Limited access to complete Google data due to privacy concerns. Privacy filter restricts showing data with insufficient distinct IP addresses.
Mobile Phone User Panel: Opt-in panel of mobile phone users for targeted research. Users receive compensation through music and video downloads. Valuable demographic for marketing and young people behavior studies. Surveys can be conducted through this panel with potential costs involved.
Comparison of Google Consumer Survey and Pew Research: Methods seen as complementary rather than replacements. Both surveys offer diverse viewpoints of underlying reality. Case study highlights consistency in reporting post-recession class perception changes.
Hal Varian on Data Analysis, Information Value, and Education in Academia and Tech
Data Analytics in Practice: Hal Varian highlights the value of using Google Ngrams for quick data collection and analysis, enabling rapid insights into various topics. He emphasizes the importance of considering the selected nature of the sample when interpreting survey results from online news sources.
Real-Life Data Challenges: Varian discusses the complexities of data analysis in the real world, where data is often messy and evolving, unlike the pristine models seen in academic settings. He advocates for providing students with experience working on unfinished data to prepare them for real-life data analysis scenarios.
Iterative Process of Research: Varian stresses that research in academia and industry often involves an iterative process, requiring adjustments as new data emerges. He advocates for conveying this iterative nature of research to students, as it differs from the polished final products typically presented in academia.
Ngrams and International Data: Varian explains that Google Ngrams allows for searching foreign words, but primarily those found in English language libraries due to fair use considerations in the U.S. He notes that scanning information from books in continental Europe is currently restricted due to the lack of a fair use provision.
Notes by: ZeusZettabyte