The Aid Fungibility Debate and Medical Journal Peer Review

September 14, 2012

The Lancet just published a letter I wrote questioning an influential study in its pages that concluded that most or all foreign aid for health goes into non-health uses. The letter follows up on concerns I expressed in this space in April 2010. Why the 2.5-year lag? Only this past January did the Seattle-based Institute for Health Metrics and Evaluation (IHME) share the data set and computer code that it used to generate the published findings. And only with those in hand could I check my concerns and describe them to others with credibility. (I'm grateful to the kind people at IHME who gave me the data and code, but don't want to let the institution per se off the hook.)Confusingly, in May the Public Library of Science published another critique of the same article. I questioned that reanalysis, and it was eventually retracted.Here, I sketch my argument, comment on the reply from Chunling Lu and Christopher Murray, then call out the Lancet for a certain lack of transparency, as well as for sometimes bringing more reputation than rigor to policy-relevant social science research.The studyThe IHME article contends that countries receiving lots of aid for health shift their own funding away from health. The upshot is "aid displacement" or "fungibility": donors meaning to fund health effectively fund other things. The thesis makes sense. What Malawi wants for Malawi---i.e., how it would allocate its budget across competing priorities---almost certainly differs from what the U.K. wants for Malawi. So if the U.K. gives Malawi lots of money for health, Malawi may well cut back its own spending on health in favor of schools or roads (or jets). On the other hand, if health aid is completely fungible, then PEPFAR hasn't put a single HIV-positive person on treatment and the great debate over whether donors are spending too much on HIV/AIDS is a farce...which doesn't ring true. Meanwhile health aid must sometimes be the opposite of fungible, attracting rather than supplanting recipient government spending on health. Overall, common sense says that the fungibility of aid varies by time, place, and purpose, probably between 0% and 100%.Less clear is whether statistical analysis can add certainty to this common sense. If you've followed the arguments over what constitutes rigor in impact evaluation, you know why. It's one thing to show that two variables are negatively correlated---that when health aid is high, recipients' own health spending is low. It's another thing to determine why. As I wrote before, "Maybe aid is indeed reducing spending by the receiving governments. Maybe governments with lower domestic health spending attract more health aid (reverse causality)." Analytically, the best way around this problem is randomization (or the fortuitous occurrence of a clean natural experiment). But I doubt any donor will randomize how much health aid it gives to each country.The IHME researchers recognized this problem, so they deployed a complicated statistical method called System GMM. I recognized a problem in how they applied it. System GMM was made by popular by canned statistical programs such as my own. It is supposed to expunge problems like reverse causality from the data, thus allowing causation to be inferred from correlation. Before IHME, enough researchers had tripped over the very problem I spotted that I documented it (claiming no originality) in A Note on the Theme of Too Many Instruments. Too many instruments, you see, weakens the Hansen J test of instrument validity. (Nod knowingly for the next couple of sentences and you'll be fine.) So researchers should test their System GMM results for resilience to radical reduction in the instrument count. The IHME researchers did not report doing that.The letterIn my letter, I shrink the instrument set in an elegant way (after making two unrelated corrections). As the instrument count falls and the Hansen test---which indicates how confident we should be that IHME removed reverse causality and other bugaboos---likely strengthens, the test begins to return bad results. "The simple, conservative explanation for these results is that the instruments are invalid throughout, and that the invalidity was missed in the published regressions because instruments were numerous....[T]he analysis...does not support the confident claim that health aid is displaced." My data and code are below.The replyThe IHME responded to my letter, but not to its argument:

Although methodological improvements in statistical estimation are always welcome, we disagree with Roodman's conclusion.

They disagree because they have been steadily improving their methods over the last two years, and their finding "remains remarkably robust both to new data and the testing of an extremely wide range of models." Indeed, I engaged over several weeks this summer in an excellent discussion with people at IHME working on their second-generation analysis, and I am certain it represents a step forward. Whether they will surmount the fundamental difficulty of inferring causation from correlation without a clean experiment remains to be seen. It is not easily done, but it is worth trying, especially if the results are not over-confidently interpreted.Three more comments on the IHME reply:

The reliance on future, unpublished work seems to concede that the old paper doesn't quite prove its claim. Pending the completion, publication, and scrutiny of the new work, I think you should conclude that, Associated Press and New York Times to the contrary, the average fungibility of health has not been determined.
The reply subtly elides the distinction between correlation and causation, and in this way deflects responsibility for the popular (mis)impression that causality had been proved. Phrasing such as "governments move their own resources to other sectors when they receive development assistance for health" implies that the second causes the first without quite saying it, tiptoeing away from the more emphatic statements in the original piece.
The reply defends the finding of fungibility as "robust." But I did not suggest that the correlation is a statistical house of cards. I questioned its interpretation, not its existence. The correlation between aid received and spending-from-own-resources may be real; what causes it is less certain.

The big storiesThe Lancet axed the final paragraph of my submission. In those last sentences, I pivoted from criticizing the article to criticizing the journal. I can't get very cynical about the cut since I had strayed from my topic and my word count was nearly twice the official limit of 400. Still, what I wrote there needs to be said and needs to be read:

This episode raises two questions for the Lancet. How can it ensure that studies using methods from economics are adequately scrutinized during review? And should the Lancet adopt a transparency policy to prod authors into sharing code and data, so that problems like these will not go undetected for years? The Lancet currently, if tacitly, endorses opacity in research, which is antithetical to replication, thus to the efficient advance of science.

I'll unpack that:

In one of his last posts on Aid Watch, Bill Easterly warned of an outbreak of dodgy social science research among top-tier medical journals. With characteristic rigor, he developed a theoretical model of the publication process for health policy--relevant social science research, then formalized the model in a flow chart:Experts on the statistical methods used in the IHME article well understand the issue I spotted. So most likely such experts were either not consulted or not listened to. And Bill showed that this is not an isolated case. Indeed, since he wrote, comments from my colleague Michael Clemens and coauthors forced the partial retraction of another Lancet paper deploying social science methods to influence public policy---this one assessing the impact of the Millennium Villages Project.My limited exposure to medical journals suggests that something is seriously wrong with the peer review process. But it should be easy to fix: get good econometricians to review econometrics.
My axed 'graph proposed a way to mitigate the consequences of soft scrutiny: crowd-source post-publication review by posting online all data and code used in published papers.The irony is that economics lags well behind medicine in the construction of institutions such as clinical trials registries to assure rigor in research. Yet when the medical journals make forays into econometrics they fall short of best practice in social science: data and code sharing are now de rigeur at top economics journals. Last year, CGD began requiring it.

To wit, here are the data and code behind my letter:

Single-imputed data set used in 2010 paper.
Multiply imputed data set provided by Chunling Lu in 2012.
Code for letter (requires recent version of xtabond2).

For the code behind the IHME paper, contact Michael Hanlon.

Disclaimer

CGD blog posts reflect the views of the authors, drawing on prior research and experience in their areas of expertise. CGD is a nonpartisan, independent organization and does not take institutional positions.