UNAIDS, WHO, PEPFAR and the Global Fund for AIDS TB and Malaria (GFATM) all depend on long-run projections in order to make the case for increased attention and financing for AIDS. This dependency is a response to the reality that HIV is a slow epidemic with extraordinary “momentum”. Even small changes in the course of new infections require years to implement and have health and fiscal consequences for decades thereafter. According to the UNAIDS web site, “[s]ince 2001, the UNAIDS Secretariat have led cutting-edge international work to define and project the developing world’s HIV/AIDS financing needs.” In 2007 UNAIDS published estimated future resource needs here. The GFATM used projection models to argue unsuccessfully for sustained funding here. And according to Congressional testimony here, PEPFAR has “looked at the impact of combination interventions on HIV infection rates, applying sophisticated modeling techniques to a generalized, high-prevalence context, and found that infections could be cut by more than half.” All of these projections were produced by one modeling group, The Futures Institute, with their suite of modeling tools, called GOALS which is available as a free download here.
Computer models are also at the heart of the active policy debate over the degree to which the AIDS community should depend on AIDS treatment as a way to prevent future infections. The possibility that the so-called “test-and-treat” proposal, which I have blogged here, could eliminate the AIDS epidemic is contested by other modelers using other models here.
But often these disputes occur above the heads and out of sight of the policy makers, US Congressional staffers and other consumers of these long-run projections, who have few grounds on which to judge their plausibility. What are the questions that consumers of these projections should (and should not) be asking?
First of all, it is clear that consumers should not be asking whether these estimates are “correct,” because no model is ever “correct.” but whether they are “plausible” and internally consistent guesses that obey some fundamental adding-up constraints. A further criterion is whether they convey to the decision maker a realistic appreciation of the impact that policy decisions will have on the AIDS epidemic and also, importantly, the *uncertainty* around the “best guess” scenario. Finally, one might ask whether policy makers exposed to the model results will be less likely to make decisions today that they or their successors will regret ten years down the road?
In approaching the GOALS model, or any other model of the future course of the AIDS epidemic, the consumer might be better armed for critical engagement if he or she understands that any set of model projections is dependent on both the structure of the model and the data or “parameters” that populate that structure. As an annex to this posting, I present a few questions that the model consumer might raise, divided into issues of “structure” and issues of “parameter estimation or data”.
Having considered all the issues I raise below, should we be skeptical about the modeling results promulgated by UNAIDS, WHO and PEPFAR, which are all based on a single model? Yes, I think we should. First, we should ask how this particular model is constructed and how its parameters are estimated. Were these the best choices to inform the policy questions under discussion? And we should also ask for modeled predictions of the effect of alternative AIDS policies to be replicated by various groups of modelers. We should ask whether each model has been validated by being subjected to a barrage of independent tests. As is the case for projections of the future of the US economy, we should be asking for “consensus models” or perhaps for the “consensus forecast” of a group of modelers.
To correct the market failure caused by insufficient academic rewards for impact evaluation, various public sector financiers have seen the wisdom of establishing specialized impact evaluation institutions like NICE and 3IE. Similarly, the academic community provides few rewards for the mundane task of replicating already published, agency-supported predictions of the future course of the global HIV epidemic. As in the case of impact evaluation, there is a strong justification for public support of an institution that would facilitate and underwrite public comparisons and even competitions among epidemiological projection models for AIDS and other long-cycle epidemics. An important principle of such an institution would be that a model is run through its paces and evaluated on criteria like those I propose below by someone other than its author. For example, why not turn loose squads of graduate students on each of the available models? Do I have any volunteers?
Questions to ask of any model:
Issues of Structure
A model’s structure, like the structure of an airplane, affects not only whether it flies at all (i.e. whether it can make plausible predictions of future trends based on past trends), but also its behavior in response to its pilot’s guidance (i.e. what it predicts will happen as a result of a policy change). Predicting the continuation of a past rend is relatively easy. Correctly predicting the response of a dynamic system like the AIDS epidemic in response to policy changes is a much more daunting challenge to the modelers. And structure plays a particularly important role in the latter.
In the structure category, I would include characteristics of the epidemiological model of HIV transmission, such as whether it is “compartment-based” or “agent-based”. A “compartment-based” model characterizes a population of people by allocating each person to a stage or a compartment and then specifying equations to describe how people move or transition from one compartment to another. The Wikipedia entry gives a good introduction here. Instead of compartments containing aggregates of people, the components of an agent-based model each represents an individual person who sequentially “decides” how to “behave” in response to a sequence of situations or events to which that simulacrum is exposed. Again Wikipedia has a detailed description here.
For some purposes, such as understanding the impact of concurrent sexual partners on the spread of HIV, an agent-based model is thought to be better suited. Since GOALS is compartment-based, it is fair to ask whether it can successfully capture concurrency and if not, how much damage that inability does to its projections. If one accepts using a compartment-based model, because of its relative simplicity, it then becomes relevant to know how many compartments there are, what the transition probabilities are across them, etc.. The equations that link the compartments constitute the most basic description of the structure. But these are hard for most of us to parse, so a diagram and sensitivity analysis would be helpful.
In addition to these structural characteristics, a model has “emergent” characteristics – which are only revealed by running it many times to check its sensitivity to alternative assumptions. For example, the figures that I used in two previous blogs (here and here) emerge from about 5000 runs each of my AIDSCost model. The GOALS model could similarly be run multiple times in order to trace out response frontiers which would more clearly reveal the emergent properties of its underlying structure than do a handful of runs.
Similarly, for the GOALS model, which is used to construct most of the projections cited in the first paragraph of this blog, I am curious whether the relationship between various prevention interventions is additive or synergistic or possibly one of substitution. That is, when two prevention interventions are expanded to scale jointly, does the GOALS model predict that their effects on averted HIV infections would be the simple sum of their independent effects, or more than the sum (synergy) or less than the sum (redundancy)? While an analysis of the equations of the model would yield clues to the answer to this question, running the model for a variety of intervention combinations and tracing out response frontiers would be more informative.
For a dynamic model, the mathematics suggests that emergent properties are particularly likely to be surprising and unpredictable from structure alone when the model is non-linear. Nonlinearities can occur in the epidemiology (e.g. from a standard SIS model) or from any of the areas of structure I list below.
Structure of the cost of supplying services
For models like my AIDSCost model and the Futures Institute’s Model, which project the the cost of the supply of delivered HIV treatment or prevention services, it becomes relevant to ask whether the structure of the cost model is linear (i.e. constant unit costs) or more realistic. For example, can the cost structure of the model capture economies of scale (at the national or the facility level), economies of scope (ditto), economies of integration with the health system (ditto), economies that accrue to competitive service delivery as opposed to hierarchically controlled monopolistic service delivery (whether public or private). All of these nonlinearities are relevant to projecting the future costs of HIV service delivery, but their relative magnitudes and the degree they are amenable to policy manipulation are empirical questions that have not yet been answered – or in some cases even addressed.
Structure of the determinants of service uptake (i.e. of the “demand” for services)
In order to make plausible predictions of the cost of achieving any given degree of future service uptake or utilization, it is necessary to model not only the cost of supplying services but also the demand for those services. (Econ 101: Utilization is the intersection of supply and demand.) Thus one must ask how any projection model captures the demand for services. It is well known that demand for any service is elastic to varying degrees with respect to price, distance, convenience, attractiveness, and the price distance, convenience and attractiveness of substitute and complement goods and services. What assumptions does the GOALS model make about these elasticities?
(To my knowledge the only AIDS cost projection model that incorporates demand elasticities is that done for Thailand by Tim Brown, Wiwat Peerapatanapokin, myself and co-authors here. For example, demand elasticities do not appear in my AIDSCost model.)
Uncertainty can influence a model’s predictions either as part of a model’s structure or by way of its data. Some models embody the view that all human and natural phenomenon are fundamentally stochastic and therefore make a random draw from a probability distribution at every point that an arithmetic computation is performed. Other models are a mix of deterministic computations and a few stochastic components. Still others are fundamentally deterministic, but could be run many times with randomly distributed parameter values. I believe that GOALS (like my AIDSCost model and many others) is in this latter category. Imagine two distributions of projected HIV prevalence in the year 2020. The two are produced from:
(1) a stochastic model run 1000 times with the same mean values for every parameter, yielding 1000 predictions for HIV prevalence in the year 2020,
(2) a deterministic model run with 1000 randomly chosen values of those same parameters, again yielding 1000 predictions for HIV prevalence in the year 2020.
Other things equal, which of these depictions of the uncertainty in future HIV prevalence is more plausible? This is a deep question, to which I don’t have an answer. I suspect a case could be made for the stochastic model, provided the details of its stochastic specifications are themselves plausible. But ultimately one would have to compare actual models.
Given the stochastic structure of any specific model, the question then arises how to convey truthfully to policy makers the uncertainty contained in model predictions. This is a difficult communication challenge. Although the UNAIDS modelers wanted for years to release upper and lower bounds for their estimates of AIDS prevalence, UNAIDS only began to publish ranges rather than single estimates after data accumulated that they had badly over-estimated the worldwide total number of HIV infections for decades. See my discussion of the revision here.
“A model should be as simple as possible, but no simpler.” (Einstein said this but so did the Lord of Occam before him.) Of course, parsimony, like beauty, is in the eye of the beholder, so one person’s “beautifully parsimonious” model is another’s “overly reductionist caricature” of reality. While model builders often like to add bells and whistles to their models, there is a serious danger that the fillips and adumbrations they add to a model’s basic structure will, like the epicycles added to the Ptolemaic model of the solar system, lead the model farther and farther away from reality. (The Ptolemeic model, with the earth at the center instead of the sun, predicted the motions of the planets across the sky pretty well but would have done a really bad job of predicting the impact of a policy intervention – such as blasting a rocket towards Mars.) Thus all of the complexity that I suggest above should be introduced only to the degree that it improves the plausibility of the model’s predictions and the usability of the model for policy analysis. Anybody like me who would like to see some additions to a model must make a convincing case that greater complexity would be worth the loss of parsimony.
Issues of Parameter “guesstimation” and Data
A model’s structure, with the components described above, is just a set of equations with unknown parameters. In order to make predictions, we must of course attach values to those parameters. For a simple model of demand, when one has observations of 1000 individuals choosing how much detergent to buy at 1000 different combinations of price, distance, characteristics and the price distance and characteristics of substitute products, one has enough degrees of freedom to formally estimate the dozen or so parameters of the structure of detergent demand. Unfortunately for these epidemiological-economic projection models, we are in the opposite situation, with perhaps ten times as many parameters as we have data points. So instead of estimating those parameters, we have to “guestimate” them. The French call it “la pifometrie”. And that’s appropriate, because the process of attaching values to these parameters, we would all agree, requires one to hold one’s nose.
Perhaps it is useful to distinguish among: Epidemiological parameters, biological or medical parameters, efficiency parameters, effectiveness parameters and demand parameters.
Epidemiological parameters. These include the shares of the various risk groups in the population and the baseline rates of activity and the rate of sexual mixing between the various groups. Depending on the structure of the model, a measure of concurrency or a measure of mean partnership duration is also required, for each compartment in the model. Important theoretical work by Anderson and May has shown that accurate prediction requires information about not only the mean but also the variance of each of these numbers, but whether it would be useful to know the variance within each separate compartment or only the variance across all compartments (which could be deduced from the distribution of their mean values) is unclear to me.
Biological or medical parameters. These include features of the natural history of HIV and the degree of infectiousness of an infected person in the various compartments through which s/he passes and the susceptibility of the uninfected individual. If acquired and used, condoms and microbicides intervene at this point to reduce an individual’s infectiousness and susceptibility.
Efficiency parameters. These include the parameters of the structure of the cost of production and distribution of HIV services.
Effectiveness parameters. These include the effectiveness of a service at preventing an HIV infection and/or prolonging the life of an infected person – assuming that the service is used. But this is a big assumption. To relax this assumption, a model would need to have a demand structure and …
Demand parameters. These include the responsiveness of utilization to changes in the policy instruments that governments and donors can manipulate, such as the price, distance, convenience attractiveness of a service and of its substitutes and complements.
With respect to each parameter within each class of parameters, we can ask whether sufficient data exists to rigorously construct estimates of a mean and a confidence interval. Where insufficient data exists, we can ask whose subjective Bayesian priors are represented in the model, what those priors are and how sensitive the model’s predictions would be to the choice of alternative priors.
If you have gotten this far, you must be either really interested in modeling per se or really interested in whether HIV models are producing reliable predictions about the magnitude of the future health and fiscal burden of AIDS. If you are a model “consumer,” rather than a modeler, I’m curious whether this checklist seems helpful. Or would you rather just accept the model predictions from someone else – and then take them on faith?