Randomized controlled trials (RCTs) are, at this point, a well-established part of the development research toolkit. Yet policymakers, researchers, and others still debate how best to learn from RCTs, what they can teach us (and what they can’t), what ethical challenges they bring, and how big a part of that toolkit they should be. Late last year, Bédécarrats, Guérin, and Roubaud edited a 450-page volume on the topic—Randomized Control Trials in the Field of Development: A Critical Perspective. Here’s my take, published in the journal Population and Development Review a few days ago.
Debates about the value and the ethics of randomized controlled trials (RCTs) in development economics have been active for at least the past 20 years, since a group of prominent economists began publishing the results of RCTs on a range of development issues. Debates about RCTs, both in high-income countries and development settings, have existed for much longer, but the past two decades have seen a marked increase in the production of RCTs in low and middle-income countries and—with them—a host of criticisms. The 2019 award of the Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel to Abhijit Banerjee, Esther Duflo, and Michael Kremer for their work using experiments to illuminate solutions to global poverty provided official recognition of the work, on the one hand; but on the other, it spurred further critical discussion.
A new volume, Randomized Control Trials in the Field of Development: A Critical Perspective, edited by Florent Bédécarrats, Isabelle Guérin, and François Roubaud, seeks to add to this debate with a collection of 13 studies, along with an introduction by the editors and a set of four interviews (with an Indian policymaker, an Indian government advisor, a French aid official, and a French aid researcher). The editors assemble an array of voices, mostly economists but also medical doctors, water and sanitation specialists, a biostatistician, and others.
Sometimes the volume feels like a true debate. Pritchett (Chapter 2) argues that RCTs distract from a more holistic view of national development in favor of a focus on specific targets (such as “eradicating extreme poverty”). Morduch (Chapter 3) rebuts that “systemic change is not always possible, and sometimes leaves parts of populations behind. Broadening access and service delivery, and expanding the provision of basic goods, remains a fundamental agenda for governments, aid agencies, and foundations.” Morduch also pushes back against the idea that RCTs drove a shift in focus away from macroeconomic growth, providing evidence that a shift towards private goods began two decades earlier. In another instance, Ravallion (Chapter 1) proposes that RCTs “get less critical scrutiny than other methods” whereas Vivalt (Chapter 11) highlights, in a related if not direct response, that RCTs show less evidence of specification searching (i.e., dropping or adding or transforming variables to get a statistically significant result) than other studies. Ogden (Chapter 4) provides a taxonomy of seven classes of RCT critiques, including many of those in other chapters (along with others unmentioned in this volume), and also provides responses to many of them.
In other places, the argument feels less balanced, as in the article-length critique of the 2015 special issue of American Economic Journal: Applied Economics on RCTs evaluating microcredit (Chapter 7). To be fair, the editors state clearly that they invited ten famous researchers who use RCTs (“randomistas,” in the preferred term of the volume) to participate in the volume, and those researchers declined. Why so? While I do not know the specific motivations, Ogden makes the argument that, while RCT proponents have grown less likely to engage in active debate with critics over the method, the RCT movement has evolved significantly, functionally responding to many of the critiques, with experiments on a wider range of topics, longer timeframes for evaluation, increased use of multiple arms to test alternative mechanisms, and more engagement with policy.
I have published the results of RCTs (as well as quasi-experimental studies and reviews of both RCTs and quasi-experimental studies), and I was tempted to assume a defensive crouch while reading this volume. Many of the critiques throughout are not exclusive to RCTs but apply just as well to quasi-experimental studies and—in some cases—to any empirical research. (Many of the chapter authors explicitly recognize this in their discussions.) As economist Pamela Jakiela put it years ago, “for some reason they keep spelling ‘study’ as R-C-T” (quoted by Ogden in this volume). Here are some examples: misreporting of studies by the media and a failure of authors to correct it (Spears, Ban, and Cumming—Chapter 6), reporting estimates as facts in a popular book based on research (Deaton—Introduction), poorly designed questionnaires, and poor reporting of study details (Bédécarrats, Guérin, and Roubaud—Chapter 7), piecemeal and unsustainable solutions with insufficient systemic considerations (interview with Gulzar Natarajan), the fact that “what works” to solve a problem may vary across contexts (Deaton—Introduction), poor choice of outcome variables, or insufficient sample size (Garchitorena et al.—Chapter 5). Yet while these problems are not unique to RCTs, neither are RCTs exempt from them. Hopefully, practitioners of other methods will likewise find inspiration to improve here.
At least two critiques highlighted in the volume do apply principally to RCTs. The first is that RCT advocates claim that RCTs sit at the top of a hierarchy of empirical methods (i.e., they represent a “gold standard”). Deaton, Ravallion, and Heckman each discuss this at length, highlighting that RCTs face their own statistical inference challenges, especially but not limited to when implementation is imperfect (which it usually is), and also that RCTs may be good at identifying an average effect of a treatment, but that often that is not the most policy-relevant statistic. (There is much more, but that’s a taste!) While most of the quotes used to establish that RCT practitioners claim pride of place are from well-known advocates (like Banerjee and Duflo), another cited as placing RCTs at the top of a hierarchy is econometrician Guido Imbens, who is not a practitioner of RCTs.
My impression is that much of the concern stems from the concern that “gold standard” language leads some people to believe that, as Ravallion puts it, “RCTs are not just top of the menu of approved methods, nothing else is on the menu!” The extreme version of this position clearly does not apply to the most well-known producers of RCTs. For example, although Bédécarrats, Guérin, and Roubaud define randomistas as “proponents who are convinced that RCTs are the only way to rigorously assess impact in evaluation, and that they are superior to other methodologies in all cases,” all three winners of the Nobel for their experimental work have quasi-experimental and descriptive work. (Banerjee and Duflo, together with Qian, published a quasi-experimental evaluation of roadbuilding just last year!) Yet a form of this does manifest in reviews of the literature (either standalone or within empirical papers) that only consider RCT evidence, implicitly or explicitly imposing the assumption that only RCTs deliver impact evidence of value. Spears, Ban, and Cumming (Chapter 6) quote relevant earlier work by Deaton and Cartwright: randomization “does not relieve us of the need to think.”
A second critique that is felt more by RCTs than by observational studies is ethical. Quasi-experimental studies have ethical issues as well—any data collection or even data use may require ethical considerations—but RCTs have the additional ethical challenge of manipulating treatment. (Again, RCTs are not unique in manipulating treatment for the purpose of evaluation, but I would propose that they do it much more commonly than most quasi-experimental approaches.) In their thought-provoking article, Abramowicz and Szafarz (Chapter 10) ask “should economists care about equipoise?” Equipoise is the principle that in advance of the RCT, researchers should be genuinely ignorant as to whether the treatment is beneficial or not. (Or, if an RCT is testing two alternative treatments, researchers should be ignorant as to which is best.) This plays an important role in medical ethics, but development economists leave it largely undiscussed in their work. In their defense, economists may argue that many interventions that advocates support are not actually proven and that RCTs have demonstrated zero effects for interventions that intuition or anecdotal experience suggested would be effective. Yet there are interventions—cash transfers are an easy example, now that hundreds of studies have studied them across many contexts—for which it is difficult to say that the treatment group is not likely to be better off than the control group.
RCT implementers may further defend a departure from equipoise by proposing that rationing will take place anyway in cases where there are insufficient resources to benefit everyone, and that randomizing may be fairer than other allocations. But as Ravallion points out, we often do have some information about who is likely to benefit the most (e.g., the poorest!). Even Ogden, whose article offers the most robust defense of RCTs in the volume, comes up empty on this one: “On the questions of equipoise, as noted above, this remains an area where the RCT movement has yet to significantly engage as best I can tell.” Yet even this may be shifting in the wake of recent controversies around the ethics of certain RCTs. A group of prominent economists, including some whom the editors of this volume would call “randomistas,” have proposed that social science RCTs include ethical discussions, including a discussion of equipoise and, in the case of scarce resources, a rationale for why randomization was better than targeting specific groups for benefits. I suspect that norms will evolve significantly in the coming years in this regard.
The volume includes much of interest that I have not touched on in detail here. Morduch (Chapter 3) highlights how RCTs, even if one is unconvinced of their value for evaluation, are valuable for exploring new types of “economic contracts, behaviors, and institutions.” Vivalt (Chapter 11) explores how incorporating prior beliefs from policymakers can help us learn more from RCTs and other evaluations. Garchitorena et al. (Chapter 5) advocate for including faster moving, nonrandomized implementation research in health delivery, a plea that is echoed in the interview with Indian policymaker Gulzar Natarajan at the end of the volume. On the whole, the volume delivers much of value, even if not all critiques are unique to RCTs.
A final point, raised repeatedly in the volume, is the hopefully obvious fact that RCTs cannot answer all questions and that even those questions that are well answered by an RCT are often best answered in complement with other methods. Questions about economic growth and trade policy are not amenable to randomization, and RCTs by themselves will not yield deep, thick characterizations of health systems and bureaucracies. An RCT will not reveal whether the goals of a program “were worth pursuing in the first place” (Picciotto, Chapter 9). Ultimately, as Spears, Ban, and Cumming put it in their discussion of water and sanitation evaluations (Chapter 6), “there is no gold standard other than careful, thoughtful research.” This standard leaves lots of room for RCTs and a wide range of other tools.