BLOG POST

Microfinance and the Effects of Energy Drinks: Which Causes Which?

September 09, 2011

One task of chapter 6 of Due Diligence is to inventory econometric bugaboos, reasons you shouldn't trust most of the published studies of the impact of microfinance. For example:

A final danger amplified by technical sophistication is “data mining,” which is the process of sifting for certain conclusions, consciously or unconsciously. A statistical analysis (a “regression”) can be run in a vast number of ways—varying the technique, which data points are included, which variables are controlled for, and so on. The laws of chance say that even if variables of interest are statistically unrelated, some ways of running the analysis will discover an improbable degree of correlation. Even a fair coin sometimes comes up heads five times in a row. Every step in the research process tends to favor the selection of those regressions that show significant results, meaning those that are superficially difficult to ascribe to chance. A researcher who has just labored to assemble a data set on microcredit use among 2,000 households in Mexico, or to build a complicated mathematical model of how microcredit boosts profits, will feel a strong temptation to zero in on the preliminary regressions that show microcredit to be important. Sometimes it is called “specification search” or “letting the data decide.” Researchers may challenge such findings with less fervor than they ought. Research assistants may do all these things unbeknownst to their supervisors. Then, tight for time, a researcher may be more likely to write up the projects with apparently strong correlations, deferring others with the best of intentions. And if two researchers with the highest standards study the same topic in somewhat different ways, the one finding the significant result is more likely to win publication in a prestigious journal.
Last May I went to Seattle to speak to members of Global Washington about my work. While there, a friend who now works for Google gave me a tour of the company's Seattle office. He described what it is like to write programs that will run on 10,000 computers at once, and how interns are sometimes told to test new ideas by first "making a copy of the entire web"--that is, duplicating the file that contains all the text on the web.No one---well, almost no one---can mine data like Google. Here's a great example of the wisdom that data mining can bestow. Go to google.com/trends/correlate and type in "microfinance". Google will show you which other search term has moved most in lockstep with "microfinance," as measured by popularity from week to week. Here's what I got (click to go to the interactive version of the graph):Of course, to inform policy, we need to go beyond mere correlation, to determine whether interest in microfinance is driving people to look twice at high-calorie drinks or the other way around.Seriously, Google is a rich and powerful company precisely because of its mastery in extracting information from unimaginable quantities of data. "Data mining" can be very useful. But when the search is not properly disciplined, nonsense can result, and this is a great illustration. Hat tip to @charlesjkenny for inspiration.

Disclaimer

CGD blog posts reflect the views of the authors, drawing on prior research and experience in their areas of expertise. CGD is a nonpartisan, independent organization and does not take institutional positions.

Topics