As the COVID-19 pandemic continues to rage and public funding is increasingly scarce, the need for high-quality, timely evidence on the effectiveness of public programs has never been clearer. Given that impact evaluations are uniquely suited to estimating the net impact of interventions or programs on desired outcomes, they play a particularly important role as part of the evidence and data ecosystem. But these methods have also elicited longstanding concerns, including that they cost too much and may take too long to provide results.
CGD recently launched a Working Group led by Amanda Glassman and Ruth Levine to consider how the next generation of investments in impact evaluations can optimize their returns for public policy decision-making. As part of this wider effort, we commissioned background research on the broad menu of recent advances in evaluation methods and data in order to examine tradeoffs related to the timeliness, cost, and policy relevance of different evaluation strategies.
We are pleased to share a new background paper by our colleague Ann-Sofie Isaksson (alongside a complementary piece focused on evidence-to-policy partnership models). The paper serves as a guide to funders and other stakeholders (within and beyond the development research community) on evaluation techniques that generate faster, lower-cost studies and highlights opportunities to put them into practice. (While the paper focuses on evidence generation, the importance of evidence synthesis in policy decision-making, especially during COVID-19, cannot be overstated—including parallel considerations related to speed, rigor, accessibility, and new platforms and tools for analysis.)
In this blog, we share a top-line summary of the methodological and data advances alongside recommendations for how to harness their potential to move the field forward. And on September 29, we’ll host Isaksson and other speakers for a CGD seminar to discuss the paper and related topics in more detail—we hope you’ll join us.
Methodological developments allowing for rapid impact evaluation
Several methodological tools offer the potential to enhance the usability and relevance of impact evaluation evidence for public policy decisions. Importantly, the approaches are not mutually exclusive; one evaluation might employ several of the methodologies discussed throughout the paper. Four of the most noteworthy examples include:
1. Evaluations with multiple treatment arms, e.g. A/B testing
Many private companies have recently integrated continuous experimentation into their operations vis-à-vis A/B testing and other analyses. As CGD’s Working Group examines opportunities for governments to embed routine evidence into their own decision-making, the widespread use of A/B testing to assess variations in program design is especially promising. For example, Banerjee et al. (2020) applied A/B testing to evaluate a COVID-19 prevention campaign in the Indian state of West Bengal, with several treatment arms receiving different messages.
However, evaluating multiple treatment arms often requires usable administrative data and large sample sizes in order to detect impacts of incremental changes, pointing to the need for greater support for data collection and infrastructure (discussed below).
2. Adaptive/iterative evaluation
To enable real-time program adaptation and inform rapid policy responses, evaluations can be set up to include multiple waves of data collection (through low-cost remote surveys, for example) and ongoing engagement with implementers. For example, a 2020 evaluation by Angrist et al. on the extent to which low-tech interventions limit pandemic-related learning loss in Botswana involved multiple rounds of data collection at four- to six-week intervals to facilitate program adaptation. Similarly, Caria et al. (2021) used “adaptive targeted experimentation” to assess the impact of labor market policies on Syrian refugees in Jordan by observing treatment outcomes overtime and adaptively optimizing treatment assignment for participants.
3. Context-specific, smaller scale impact evaluations
Using this analytical approach, researchers aim to align the evaluation’s objectives with the temporal, budgetary, operational, and political considerations of the implementer to inform a specific decision. Although this may not always be suitable to creating generalizable knowledge across settings, putting implementer needs and constraints at the center of evaluation design can help create a glide-path for policy uptake.
For example, Wang et al. (2016) demonstrated the impacts of providing a package of childcare items on increased facility-based deliveries in Zambia in just over three months. This timeframe helped inform the government’s decision to update its national policy to provide “mama kits” to all health facilities just nine months after the evaluation was first commissioned, illustrating the benefits of adapting to the implementer’s priorities for policy relevance.
4. Quasi-experimental methods
While econometric techniques to find “control” groups that are statistically similar to “treated” populations are not new, noteworthy techniques, such as synthetic controls, “surrogate” proxies, and machine learning (ML) predictions, illuminate the increasingly sophisticated tools at researchers’ disposal. These techniques are often used in combination with novel data sources, including granular spatial data and large administrative data sets, to control for potential confounding variables at more specific geographic levels and assess relevant outcomes.
Data developments enabling speedier impact evaluation
Technological advances in WiFi, cell phones, GPS, and satellite imagery have made it more feasible to gather and share data, and new types of software make this data easier to combine, analyze, and use. Isaksson reviews the strengths and weaknesses of five data sources with ample promise for more relevant impact evaluations:
- Geocoded survey data
- Administrative data
- Remotely sensed data
- Low-cost remote surveys
- Big data from ML
These data sources offer numerous benefits for more rapid impact evaluations. Remote surveys, including computer-assisted telephone interviewing (CATI) and SMS surveys that have been used expansively during COVID-19, give researchers the flexibility to design their own survey instruments at relatively low costs and can be rapidly conducted by using existing sample frames. For example, the Cox’s Bazar Panel Survey tracks a representative sample of displaced Rohingya households and their host communities and is explicitly designed to be a “sandbox” testing environment that streamlines data collection for numerous evaluations.
Further, evaluations using geocoded data from comprehensive survey projects like the DHS and Afrobarometer allow for flexibility in the level of analysis (ranging from the impact of a single intervention to the impact of a specific donor’s projects to the impact of all sectoral-specific projects across several countries), making them useful for governments and other stakeholders interested in a broader understanding of development effectiveness.
But several outstanding challenges remain. Communities with less access to digital tools may be less represented in certain data sources. For example, while Aiken et al. (2020) found ML methods to be just as accurate as standard surveys in identifying ultra-poor households eligible for program benefits in Afghanistan using mobile phone data, the study explores how data biases risk perpetuating digital divides. Notably, Isaksson also recognizes that examples of impact evaluations using remotely sensed and ML data are still quite scarce. Capacity strengthening efforts for geospatial impact evaluation, for example, could be immensely valuable considering that the large quantity of available data and tools currently outweighs researchers’ capacity to conduct geospatial impact evaluation.
The “so what?”: How funders, implementers, and researchers can unlock the potential of rapid, rigorous evaluations
The background paper outlines ways in which the development community can collectively work towards maximizing the usability and relevance of these data sources and methods for real-world decision-making. In addition to scaling up support for the context-specific, adaptive approaches described above, funders, evaluators, and implementers should:
Across the research toolkit, a flexible approach to choosing which methods and data to use is key to meeting policymakers where they are. There is no one-size-fits-all; the appropriate way to commission and conduct evaluations depends on the policymakers, objectives, and constraints involved.
2. Build data infrastructure
Tapping into the wide variety of readily available data (e.g., administrative data and geographically referenced survey data) would make impact evaluations much faster and cheaper. However, this necessitates greater investments in data collection, quality, and infrastructure (including national statistical systems) in order to link these data sources to each other and make them accessible and usable for researchers and program implementers, as discussed in the 2021 World Development Report on Data for Better Lives and a CGD Working Group on Governing Data for Development. In this vein, those in the evidence-based policymaking community have an opportunity to better collaborate with those in the data-for-development space and leverage digitalization to demonstrate the power of evaluation.
3. Invest in system-wide functions along the entire evidence continuum
Realizing the potential of innovative analytical tools requires supporting those who can analyze this data and facilitating their engagement with policymakers to ensure evidence uptake. For example, France’s newly launched Fund for Innovation in Development offers a designated pot of resources to strengthen government capacity to evaluate and bring innovations to scale and to institutionalize specific innovations into large-scale policies. Without such investments along the entire evidence continuum to feed into system-wide functions, one-off projects will continue to fall short on informing real-world decision-making. It is also crucial to develop the long-term relationships, institutional processes, and related infrastructure that can turn rich data sources into integrated information and ultimately enable policy action.
The paper calls for the expanded deployment of faster, less expensive methods that are still reliable and rigorous in order to improve public policies as our collective bottom-line. Increasing the speed, decreasing the cost, and broadening the applications of rigorous impact evaluation methods will require more policy-relevant approaches on behalf of evaluators, increased and flexible financing on behalf of funders, and value-driven decisions on behalf of policymakers. Together, these groups can help translate data and methodological developments into better evidence that leads to better policy outcomes.
Join us on Wednesday, September 29 at 9:30am ET for a CGD seminar to discuss this work in more detail. Ahead of and during the event, please share your reactions and questions in the below comments section, on Twitter @CGDev #CGDtalks, or by email at email@example.com. We look forward to hearing from you.