In March, Mark Pitt leveled two major charges at the attempt by Jonathan Morduch and myself to replicate the Pitt & Khandker assessment of microcredit. Both charges highlighted discrepancies between our version of the statistical analysis and the original. One of those was that we left out a control variable (this post explains the error without excusing it). The other I will explicate now in a nontechnical way. What Pitt calls a mistake is actually a manifestation of deeper and thornier issue. The upshot of the confrontation with this issue will not be, as in Bimodality in the Wild, the revelation of a fatal weakness. Rather, Jonathan and I offer a way to resolve the issue, and it leaves the PK conclusions intact. I thought you'd be interested nonetheless.
I began the pedagogic part of Bimodality in the Wild by fitting a line to a graph of height versus age for 100 hypothetical kids. Implicit in modeling the world with straight lines is an assumption of constant marginal impact. That is, if you believe my graph, then on average children grow the same number of centimeters per year whether they are 5 or 15. That's an unrealistic model since it doesn't allow for teenage growth spurts. In general, the assumption of constant impact is often patently unrealistic. Consider that for a capital-starved entrepreneur, the first $100 of microcredit might be really valuable. The next $100 might be pretty useful too, and so might the next...but at some point returns to additional capital would diminish. They would not be constant.
As a rough rule of thumb, you might think that going from $100 of credit to $1,000 would have about the same impact as going from $1,000 to $10,000. So we might imagine the impact of microcredit on household spending (an indicator of poverty) to look like this:
The flattening at the high end reflects diminishing returns to more microcredit.
A common statistical response to the prospect of diminishing returns is to view the world through logarithm-tinged glasses. In the logarithmic view of the world, 100 is halfway between 10 and 1,000. Here are the same data, but graphed with a logarithmic scale from left to right. Notice how 100 is now indeed at the midpoint along the bottom:
In effect, I made the second graph by horizontally stretching the left part of the first graph and compressing the right part. That turned the curve into a line.
When crunching numbers rather than making graphs, the equivalent step---done in Pitt & Khandker---is to "take the log" of quantities such as microcredit. Statistical analysis then proceeds with these "logged" values. It works something like this:
log 1,000,000 = 6
log 100,000 = 5
log 10,000 = 4
log 1,000 = 3
log 100 = 2
log 10 = 1
log 1 = 0
As you might expect, the logs of other numbers are fractions: since 5 is between 1 and 10, the log of 5 is between log 1 and log 10 (0 and 1). Approximately speaking, the log is the number of digits in a number, minus 1. Notice how 100 (whose log is 2) once more ends up halfway between 10 and 1,000 (logs of 1 and 3). This is just the change we want if we believe in diminishing returns and yet, for mathematical and computational convenience, want to fit straight lines to data, as motivated by the second graph above.
But there can be an ugly problem with taking logs, and it often gets swept under the rug. To see it, consider how my list above should be continued. Since in each new row we divide the number on the left by 10, the list should go on like this:
log 0.1 = --1
log 0.01 = --2
log 0.001 = --3
The problem is this: although the logged numbers on this list are getting very small, they will never get to log 0. If anything, the log of 0 is negative infinity. It turns out that collapsing our data spectrum at the high end---bringing 1,000 closer to 100---comes at the cost of exploding it at the low end---sending 0 infinitely far from 1. So if a dutiful research who believes in diminishing returns is taking the log of all microcredit values in her data, what is she to do when she hits a non-borrower, someone whose borrowings are 0? 0 is nowhere on the list of log-able values. And statistical software can't handle negative infinity very well.
The usual "solution" is to treat people who take no microcredit as taking a tiny bit---a dollar or a dime or a penny. Pitt & Khandker, Pitt explained in March, assigned non-borrowers a value of 1 Bangladeshi taka, which was about 2.5 cents. As far I can tell, this choice was not documented before, so Jonathan and I went with 1,000 taka in the 2009 version of our paper, about $25. This was the smallest total amount that any family in the study actually borrowed in the five years leading up to data collection.
The next graph shows the PK data on household spending versus microcredit borrowing. There are three dots for each of the 1,798 households in the study, since each was visited by surveyors three times. Roughly speaking, this is the graph for which we'd like the best-fit line to slope upward, indicating a positive impact of microcredit on household spending. The point of the graph is to illustrate how, when viewing the world through log-tinged glasses, the two ways of treating non-borrowers are really different:
You can imagine that the best-fit line embodying the impact of microcredit on household spending would change a lot as one switched from putting the non-borrowers in that column on the far left (as PK did) to the one in the middle (as we did in 2009). And the slope is what tells you the impact of getting more microcredit. This is not a good state of affairs: an arbitrary, undocumented, unanalyzed choice can have a big impact on results.
What's the right thing to do? With some reason, Pitt criticized us for treating people who got no microcredit as if they had actually borrowed 1,000 taka:
Quite simply, Roodman and Morduch arbitrarily assign 1000 units of treatment to the control group who were untreated. [Emphasis in original.]
But Pitt & Khandker's choice is at least as problematic. Consider what it implies if we are fitting straight lines to the data: moving from being a non-borrower (which they represent as borrowing 1 taka, on the left edge of the graph) to being a minimal borrower (borrowing 1,000 taka, three steps to the right on the log scale) has the same impact on household spending as moving from being that minimal borrower to borrowing 1,000,000 taka (three more steps to right, beyond the right margin of this post). Does it seem plausible that putting one's toe in the microcredit water has the same proportional impact on welfare as borrowing another 999,000 taka? Seems extreme to me.
There is no easy answer here. (And the inverse hyperbolic sine transform does not solve the problem because it does not eliminate the arbitrariness.) Jonathan and I argue for ducking this issue by viewing microcredit use as binary: junk the information about how much people borrowed and just represent non-borrowers with a 0 and and borrowers with a 1. One can then study the average impact of being a microcredit user. We show that this change strengthens PK's findings (see the "probit" column of Table 3)---which we still doubt for other reasons.
The larger lesson is once again about transparency. The inaccessibility of PK's computer code made it hard to determine how they treated non-borrowers. That failure to share, too, has been standard practice in social science. That should change.