P O S T E D B Y A L B E R T
I was recently invited by Bill Schambra, director of the Bradley Center for Philanthropy & Civic Renewal, to speak on a panel titled Metrics Mania. Panelists were asked to comment on an essay commissioned from Gary Walker, the founding director of Public/Private Ventures, on the subject of evaluating social programs. Here is the text of my remarks ...
Bill Schambra saw my role on this panel as the guy who when asked about metrics replies “metrics schmetrics,” but I’m going to have to disappoint him—to some extent. Measurement and evaluation, when done properly, are not just a bit of value-added for philanthropic or nonprofit work, they’re absolutely essential. Only a fool would disagree with that proposition.
But here I mean not just the kinds of formal evaluations described by Gary Walker in his essay, but informal evaluation as well: the kinds of course corrections we naturally make when we embark on a project, take a false step, and adjust what we do accordingly. Evaluation is not and should not be the sole province of the highly compensated consultant. We evaluate all the time; our own eyes and ears notice things the most astute consultant will never notice; and we’ll often be our own worst critics.
Now here’s where the metrics schmetrics comes in, perhaps: We’ve written more nonsense about evaluation than just about any other subject in philanthropy. Worries about evaluation, engendered in part by logic models the length of whale intestines, have become the math anxiety of the philanthropic world.
My general thesis—if I could call it that—is that from the perspective of somebody like Mr. Walker whose organization has been commissioned to conduct lucrative, large-scale evaluations of social programs (lucrative by nonprofit standards), the Impact Revolution might seem like a good thing. But from the ground, from the perspective of many people working in community-based organizations, this so-called revolution has brought with it new sources of irritation, new ways of adding meaningless make-work to already overburdened nonprofit staff members.
It has not been a people’s revolution, in other words, but rather one championed by elites—like myself, I’m afraid— sometimes unable to see far enough beyond our own measuring sticks to understand the limitations of formal evaluation techniques, and the trade-offs in staff time and other resources that these formal techniques require.
Some of these limitations have been well rehearsed—I’m thinking, for example, of the charge that this revolution has in many cases (not all) prompted us to measure the things that can be measured rather than the things that are important. Other limitations, like the absurdity of many logic models, are more technical and less well understood or discussed.
I’ll get to what I believe is perhaps the most challenging aspect of this revolution in a minute, but first I’d like to make a few quick comments about Gary’s paper:
After pointing out that “outcomes” are not the same as “impacts,” Gary comes to the startling conclusion that “the overwhelming majority of social programs with impact studies do not show a significant change in participants’ lives a year or two after the program.” He tries to soften the apparent harshness of this conclusion by claiming that these dismal results are an “artifact” of our approach to conducting these impact studies. He points out, for example, that these studies are typically conducted for new projects that haven’t been given the time or resources to work out their kinks. New projects, lots of problems that haven’t been worked out, dismal results.
As you ponder Gary’s startling claim, I urge you to keep the following in mind:
1. First, those of us who have worked in federally-funded programs, like those evaluated by Gary’s organization, know how notoriously stingy the feds are when it comes to payments for program expenses and overhead. Throw in the Byzantine reporting requirements and I’d be surprised if any of these programs succeeded.
2. It’s difficult to assess Gary’s claim without having all the data in front of us. Is it possible that the same organizations conducting the impact studies might also have been called upon to fix the problems they identified? If so, couldn’t this have introduced a source of bias?
3. Given that Gary himself says that impacts might not be apparent for ten or fifteen years—how many of the impact studies he refers to were conducted over this very long time span? How can any impact study with a pretense to being scientific, ever control for the bias introduced by the self-selection of the clients for a particular program? And the list of methodological worries goes on.
Unfortunately, the ink hasn’t had time to dry on Gary’s essay before his words are yanked out of their context and given pride of place in the invitation for today’s panel discussion. And here I quote: “Why is it that philanthropy has learned so much about metrics and yet has so little by way of measurable success to show for it?”
I vigorously reject both of the implied claims. I’m not convinced we’ve learned much about metrics—we’re doing more of it, perhaps, but we’re not doing it better; and I certainly reject the notion that philanthropy has little to show by way of measurable success.
But the damage is done. The meme is loose and I can’t call it back.
I resonate most with Gary’s essay when he writes that “the first things funders need to be accountable for is the quality of the program which they’re funding. That requires patience, and a use of funds for things like training ....” I can think of other investments funders should be willing to make, but I like how Gary’s words suggest a certain amount of care and support for grantees, a genuine collegiality.
And this gets to my primary worry about the metrics revolution: I find the image of a funder with a stopwatch in one hand and a clipboard in the other, hunched over a perspiring grantee, rather ghastly, frankly. It’s uncivilized, so clearly opposed to what I believe should be the ethos of the charitable sector, an ethos rooted in love for our fellow men and women, expressed through our work, and incorporating the values of cooperation and mutual support, among others.
Too often, however, we funders use evaluations like blunt weapons, barely understanding—if at all—the limitations of these tools, and certainly being unwilling, for the most part, to turn these weapons on ourselves.
A few very quick points about this impact revolution as seen from the ground, then I’m done.
1. I’d like to suggest an “outputs counter-revolution.” I find the whole progression from outputs to outcomes to impacts one of the great bugbears of contemporary thinking about evaluation. For those of you unfamiliar with this bit of chicanery, outputs are the things you do, like mentoring a young person for five hours a week; outcomes happen because of the things you do; and impacts, I assume, are outcomes that stick or that extend beyond our original goals.
I wouldn’t be surprised if someday we started requiring grantees to demonstrate not just their impacts but their—what would we call them?— hyperimpacts: the effects of a given social program on the afterlife, or on universes parallel to our own. Note that it would be absurd for us to call the gas company, thank them for their outputs (namely, the gas they deliver to our houses), and then complain that they haven’t demonstrated to us any outcomes or impacts. Why is it that we reserve this requirement for the people who work in the nonprofit sector?
And what’s so bad about outputs? As a donor, it’s enough for me to know that you delivered a quality youth-development program to 25 kids in a church basement who wouldn’t otherwise have the opportunity. For God’s sake, don’t incur the expense of trying to track the effect of your program on these kids ten or fifteen years down the pike. That would be a ridiculous waste of resources.
Unfortunately, all this talk of outputs, outcomes, and impacts blinds us to the fact that in many cases—again not all—simple outputs are all that we can reasonably hope for or require.
2. I don’t know how many of you have followed the development of “distributed computing.” In this model, instead of having and running an entire application like Photoshop on your desktop, it’s distributed across two or more computers connected by a network. There are many advantages to this model, among them, your ability to access and use the most updated version of Photoshop without having to purchase the entire thing directly and load it onto your machine.
Just like we’ve seen the advent of distributed computing in the digital world, I’d like to suggest we try something like “distributed evaluation” in the nonprofit world. Here’s how it would work in the case of a youth development organization, to take one example. We assume that academics and others have already researched to death and determined those elements of a youth development program likely to yield good outcomes for young people. If you’re a grantee, suppose we make it your responsibility to demonstrate that you’ve incorporated these elements into your program. We then make it part of my responsibility as a funder to know (because I’ve done the homework and read the literature) what these success-generating characteristics are, and to verify that they’re characteristics of your program. In this way, the burden of evaluation is shared three ways, and neither the funder nor the grantee needs to prove for the eleven-billionth time that young people respond well to nurturing environments that stimulate their hearts and minds.
3. I want to make clear that I’m not in the least anti-evaluation. As I've written elsewhere, I’m concerned that we tend to seek a kind of scientific or moral certainty from a formal evaluation where none exists. The questions that funders most often bring to an evaluator—Was this program worth our $25,000 investment? Should we continue funding it?—are questions only they, the funders, can answer. Say we measure a 25 percent drop in the truancy rate for a hundred kids in some program, and a 25 percent increase in their test scores. Is that worth $25,000 to you? Each donor needs to answer that question for him- or herself. As donors we will never be absolved of our responsibility to use our good judgment.
One of the greatest benefits an organization like Grantmakers for Effective Organizations can provide to the field is not a training on how to conduct evaluations—we have plenty of those—but on the questions that evaluations will never be able to answer. We might also benefit from being reminded that in a business context, we often strive to convert all our currencies to a single coin—namely, money. But that in many nonprofit contexts, values like mercy, justice, and love frequently motivate decisions that don’t always make sense to the bottom line and whose effects can’t always be measured.
_____
Image sources: DHD Multimedia Gallery, Fotosearch, Carmody Consulting
Outstanding contribution to the entire evaluation debate!--wish I'd written it. Thanks so much for your clarity on the issue, and particularly for the idea of "distributed evaluation."
Posted by: Nonprofiteer | March 25, 2008 at 12:31 PM
A university ombudsman said something I remember twenty years later:
We only fully embrace our students as members of the community as they are leaving that community.
He said other things I remember, too, like 'Sex is worth the hearing loss.'
Quite an impact.
Posted by: Antoine Möeller | March 25, 2008 at 01:11 PM
Well, if you're going to lose your hearing anyway, might as well enjoy yourself while you're doing it.
"Impact" language in evaluation reminds me of the "target audience" language in communications. Very martial. Makes me want to duck behind a bale of hay.
Posted by: Albert | March 25, 2008 at 06:04 PM
Wow! What an outstanding post, Albert. I commented in my own post today.
So glad to have you posting regularly again. Keep it up!!
Posted by: Sean Stannard-Stockton | March 26, 2008 at 11:42 AM
Excellent piece! Great perspective on metrics and the fetishization of them.
Your concept of "distributed evaluation" is terrific. I recall many years ago a professional evaluator here in LA telling me that the best that any nonprofit could do is exactly what you suggested - learn what research, done by academics and others with that expertise, has shown to be most effective (this kind of reminds me, by the way, of the concept of "procedures," recommended practices based on experience, from the military or NASA.) , incorporate those practices into their programs, and make the argument to their funders that they work is designed in a way that should work. Unfortunately, usually the conversation flows the other way - driven by the funder's requests - and in addition, I imagine it will take a great deal of discipline for funders to refrain from asking grantees to collect data beyond the verifications of effective practices you describe in distributed evaluation. That's why you'll need the counter-revolution you call for, to make it possible.
Thanks for sharing this and all your posts.
Posted by: Pete Manzo | March 26, 2008 at 01:42 PM
Liked the essay, but I do wish to take issue with the following. You wrote:
"Note that it would be absurd for us to call the gas company, thank them for their outputs (namely, the gas they deliver to our houses), and then complain that they haven’t demonstrated to us any outcomes or impacts. Why is it that we reserve this nonsense for the people who work in the nonprofit sector?"
To carry your analogy forward, if the gas is the output, then the outcome would be the flame, and the impact would be the heat. We don't complain to the gas company about insufficient outcome and impact because the output is reliably high quality. However, if the gas company started producing bad output---if it started piping unburnable nitrogen rather than methane into our houses---then yes, very much so, we'd complain about insuffient outcome and impact: "Hey, my furnace shut down and it's freezing in here!"
Posted by: mugwumpiana | April 02, 2008 at 11:16 AM
Your comment raises an interesting question, mugwumpiana. What's the real analogue of an outcome or an impact in the case of the gas company? My own inclination—not shared by you, apparently—is to look at gas, flame, and heat as outputs. After all, the path from the first to the last in the series is the mere lighting of a match. When unreconstructed funders inquire after outcomes, however, they’re typically looking for effects that have a less direct (and causally efficient) connection to the simple output. Suppose your program is helping a young child with his homework five days a week. One of these unreconstructed funders might ask, for example, “What has been the effect of your mentoring on the child’s grades and test scores?” (outcomes), “Did the child go to college and ultimately serve society by going to work at the gas company?” (impacts), and “Did the example of the child’s success encourage Tv@rrr, denizen of a parallel universe, to take up the snackletuner?” (hyper-impact).
Analogously, one might ask the gas company representative to demonstrate to us that our funding of their product produces a net benefit for the United States. Can this representative prove, by basing his answer on the analysis of a highly compensated consultant, that overall the gas industry doesn’t simply increase our dependence on fossil fuels, leading ultimately to tragic outcomes and impacts for all? Even if we don't go this far, it still seems silly to ask the representative to demonstrate the gas company's social outcomes.
That’s my analysis. I can understand how other people’s results might differ.
Posted by: Albert | April 02, 2008 at 12:41 PM
Hey Albert, thanks for the reasoned reply.
I wonder if the tendency of folks to ask nonprofit grantees, but not the gas company, to demonstrate "impact" is that the whole raison d'etre of a philanthropic foundation is to produce social benefit---impact---in amounts greater than that which would have been produced by the tax revenue the government has failed to collect on the foundation's assets.
The gas company doesn't purport to do anything besides pump methane into your house---produce a simple output. It doesn't promise anything in the way of outcomes and impacts. What you do with the gas---short of blowing up the neighborhood---is your own business.
(On the other hand, much advertising "promises" outcomes and impact beyond the simple output of delivery of the product. When I write out a check to the Chevy dealer, the output I expect is the handing over of a set of keys to a shiny new Corvette. Imagine my disappointment when the implied outcome---enhanced personal sex appeal---and impact---hot babes galore!---fail to materialize.)
It's altogether right and proper to question the outcomes and ultimate impacts (if not the hyperimpacts) of foundation initiatives. The trick is to do so in a way that promotes good grantmaking rather than hinders it, and in that sense your caveats and objections are well taken.
Cheers!
Posted by: mugwumpiana | April 03, 2008 at 10:35 AM
mugwumpiana, I haven't seen your comments in the past. Your comments are great and I'd love to see you over at my blog Tactical Philanthropy.
This is a great debate you two are having. In the for-profit sector, the impact is always the same: make money. Why do people buy the gas? Who cares. If the company can generate a profit, then the investors are happy. But as mugwumpiana points out, nonprofits only exist to further their mission.
Imagine a nonprofit whose mission was to prevent elderly people from dying due to cold weather. They would probably want to supply heat to the people's living quarters. The impact they seek is keeping people from freezing to death. The gas, flame and heat are only relevant to the extent they prevent death. If this nonprofit ran around hooking natural gas pipes up to peoples houses for free, the only question the funder should care about is: did you prevent deaths?
We don't ask the gas company for their impact because we don't care. If they are turning a profit, they are achieving their goal. But we ask the nonprofit for their impact (saving lives), because it may not be self-evident whether they are achieving their goal.
Posted by: Sean Stannard-Stockton | April 04, 2008 at 10:41 AM
Great discussion! Warning -- long post.
Let me play out the gas company analogy another way. I want to be well-fed and warm -- these are the impacts I am seeking to achieve. The energy in my house for heat and cooking is the outcome that should lead to those impacts. And gas is one way -- but only one of several -- to deliver that outcome. I could have electric energy, which could be supplied by the local electric utility or by the windmill or solar panels on my house.
As a consumer, what I care about most is the impact, but I also care about the cost, efficiency and carbon footprint of the mechanisms for achieving that impact. I can do some research on those things and figure out what tradeoffs among cost, efficiency and carbon output I am willing to make. I make these tradeoffs, though, based on knowing fairly clearly the desired impacts -- the temperature I want my house to be, how much I cook, etc. I might revise some of these (keep my house cooler, for example), and I personally don't work this out on spreadsheet, but the variables are generally known. I could even do something like put in new windows or add insulation to help achieve the warm-house impact.
Take this to a social program now. If the impact we are trying to achieve is children reading at grade level by grade 3 (which research shows us is critical to long-term academic and career success), there are any number of ways we can try to have that impact: we can reconfigure schools, activate parent groups, revise curricula, train teachers differently, etc. What we don't know, however, is how to trade all those off to achieve the desired impact.
Ideally, evaluation of foundation programs should help us to learn how to make those tradeoffs. In reality, though, each particular school will have its own set of circumstances (it is a complex system that we are trying to change) such that mixing two parts teacher training and one part class size reduction and three parts parent involvement is not necessarily the recipe for success in every case. Or, to stick with original metaphor, the cost, efficiency and carbon footprint of these strategies differs widely in different settings.
The national movement toward evidence-based practice (most pronounced in health care, but gaining ground in education) is very similar to the "distributed evaluation" model. Methods that are shown to be effective in clinical trials or rigorous educational outcome studies are the ones practitioners are encouraged to adopt. This is the research that is funded by the federal government -- not foundations.
My belief is that what foundation evaluations can and should do -- working collaboratively with practitioners -- is to develop principles that guide the implementation of proven practice. Back to the recipe analogy, it might read "add flour and knead until dough is stiff" rather than "add 2 cups of flour."
This only works, though, if we know what we are trying to achieve. The metric about the final impact -- reading proficiency by third grade, for example -- HAS to be consistently measured. Otherwise, we can keep revising curricula and convening parents, but forget why we were doing it in the first place. This is not to say that goals won't change -- they should change if there is a reason -- but we don't want them to drift.
Posted by: Teri Behrens | April 04, 2008 at 02:00 PM
You guys must not read a lot of for-profit mission statements.
Microsoft: To enable people and businesses throughout the world realize their full potential.
"Mr. Gates," asks the skeptical evaluator, "can you demonstrate to me that because of your efforts people are now meeting their full potential?"
Glaxo Smith Kline: To improve the quality of human life by enabling people to do more, feel better and live longer.
I'm not feeling so good right now.
FedEx will produce superior financial returns for shareowners by providing high value-added supply chain, transportation, business and related information services through focused operating companies.
Superior to what?
Posted by: erasmus | April 04, 2008 at 04:01 PM
Which raises the question of who is doing the evaluating. For investors, all of those mission statements are implicitly preceded with... "To make money by..."
Posted by: Teri Behrens | April 07, 2008 at 03:17 PM
Thank you so much for this enlightening post and discussion. Spent Saturday in a meeting with a non-profit consultant and supporters defining our new organization. The most frustrating part for me was discussing metrics. I find it difficult to see how what can be measured will show the real impact, especially short term. But the most significant thing you have made me realize is that my primary resistance to expanding the metrics we track is that my co-founder and I cannot take on any more responsibilities. The 'distributed evaluation' concept turned on a light bulb.
Posted by: Jeane Goforth | April 21, 2008 at 04:31 AM