Is the Methodological Axiom of the Potential Outcomes Approach Circular?

Hernan, VanderWeele, and others argue that causation (or a causal question) is well-defined when interventions are well-specified. I take this to be a sort of methodological axiom of the approach.

But what is a well-specified intervention?

Consider an example from Hernan & Taubman’s influential 2008 paper on obesity. In that paper, BMI is shown up as failing to correspond to a well-specified intervention; better-specifed interventions include one hour of strenuous physical exercise per day (among others).

But what kind of exercise? One hour of running? Powerlifting? Yoga? Boxing?

It might matter – it might turn out that, say, boxing and running for an hour a day reduce BMI by similar amounts but that one of them is associated with longer life. Or it might turn out not to matter. Either way, it would be a matter of empirical inquiry.

This has two consequences for the mantra that well-defined causal questions require well-specified interventions.

First, as I’ve pointed out before on this blog, it means that experimental studies don’t necessarily guarantee well-specified interventions. Just because you can do it doesn’t mean you know what you are doing. The differences you might think don’t matter might matter: different strains of broccoli might have totally different effects on mortality, etc.

Second, more fundamentally, it means that the whole approach is circular. You need a well-specified intervention for a good empirical inquiry into causes and you need good empirical inquiry into causes to know whether your intervention is well-specified.

To me this seems to be a potentially fatal consequence for the claim that well-defined causal questions require well-specified interventions. For if that were true, we would be trapped in a circle, and could never have any well-specified interventions, and thus no well-defined causal questions either. Therefore either we really are trapped in that circle; or we can have well-defined causal questions, in which case, it is false that these always require well-specified interventions.

This is a line of argument I’m developing at present, inspired in part by Vandebroucke and Pearce’s critique of the “methodological revolution” at the recent WCE 2014 in Anchorage. I would welcome comments.

Causation, prediction, epidemiology – talks coming up

Perhaps an odd thing to do, but I’m posting the abstracts of my two next talks, which will also become papers. Any offers to discuss/read welcome!

The talks will be at Rhodes on 1 and 3 October. I’ll probably deliver a descendant of one of them at the Cambridge Philosophy of Science Seminar on 3 December, and may also give a very short version of 1 at the World Health Summit in Berlin on 22 Oct.

1. Causation and Prediction in Epidemiology

There is an ongoing “methodological revolution” in epidemiology, according to some commentators. The revolution is prompted by the development of a conceptual framework for thinking about causation called the “potential outcomes approach”, and the mathematical apparatus of directed acyclic graphs that accompanies it. But once the mathematics are stripped away, a number of striking assumptions about causation become evident: that a cause is something that makes a difference; that a cause is something that humans can intervene on; and that epidemiologists need nothing more from a notion of causation than picking out events satisfying those two criteria. This is especially remarkable in a discipline that has variously identified factors such as race and sex as determinants of health. In this talk I seek to explain the significance of this movement in epidemiology, separate its insights from its errors, and draw a general philosophical lesson about confusing causal knowledge with predictive knowledge.

2. Causal Selection, Prediction, and Natural Kinds

Causal judgements are typically – invariably – selective. We say that striking the match caused it to light, but we do not mention the presence of oxygen, the ancestry of the striker, the chain of events that led to that particular match being in her hand at that time, and so forth. Philosophers have typically but not universally put this down to the pragmatic difficulty of listing the entire history of the universe every time one wants to make a causal judgement. The selective aspect of causal judgements is typically thought of as picking out causes that are salient for explanatory or moral purposes. A minority, including me, think that selection is more integral than that to the notion of causation. The difficulty with this view is that it seems to make causal facts non-objective, since selective judgements clearly vary with our interests. In this paper I seek to make a case for the inherently selective nature of causal judgements by appealing to two contexts where interest-relativity is clearly inadequate to fully account for selection. Those are the use of causal judgements in formulating predictions, and the relation between causation and natural kinds.

JOB: Post Doc – prediction, philosophy of epidemiology, philosophy of science

The Department of Philosophy at the University of Johannesburg seeks to appoint a postdoctoral research fellow to work under the supervision of Prof Alex Broadbent. In particular, ideas for work on (1) prediction or (2) philosophy of epidemiology are welcome; but any area of speciality within the philosophy of science broadly construed (including the philosophy of medicine) will be considered. Please send a CV, cover letter, and writing sample to abbroadbent@uj.ac.za by 26 September 2014. PhD must be in hand. Start date February 2014. Informal inquiries welcome to the same email address.

A Tale of Two Papers

I’m on my way back from the World Epi Congress in Anchorage, where causation and causal inference have been central topics of discussion. I wrote previously about a paper (Hernan and Taubman 2008) suggesting that obesity is not a cause of mortality. There is another, more recent paper published in July of this year, suggesting, more or less, that race is not a cause of health outcomes – or at least that it’s not a cause that can feature in causal models (Vanderweele and Robinson 2014). I can’t do justice to the paper here, of course, but I think this is a fair, if crude, summary of the strategy.

This paper is an interesting comparator for the 2008 obesity paper (Hernan and Taubman 2008). It shares the idea that there is a close link between (a) what can be humanly intervened on, (b) what counterfactuals we can entertain, and (c) what causes we can meaningfully talk about. This is a radical view about causation, much stronger than any position held by any contemporary philosopher of whom I’m aware. Philosophers who do think that agency or intervention are central to the concept of causation treat the interventions as in-principle ones, not things humans could actually do.

Yet feasibility of manipulating a variable really does seem to be a driver in this literature. In the paper on race, the authors consider what variables form the subject of humanly possible interventions, and suggest that rather than ask about the effect of race, we should ask what effect is left over after these factors are modelled and controlled for, under the umbrella of socioeconomic status. That sounds to me a bit like saying that we should identify the effects of being female on job candidates’ success by seeing what’s left after controlling for skirt wearing, longer average hair length, shorter stature, higher pitched voice, female names, etc. In other words, it’s very strange indeed. Perhaps it could be useful in some circumstances, but it doesn’t really get us any further with the question of interest – how to quantify the health effects of race, sex, and so forth.

Clearly, there are many conceptual difficulties with this line of reasoning. A good commentary was published with the paper (Glymour and Glymour 2014) which really dismantles the logic of the paper. But I think there are a number of deeper and more pervasive misunderstandings to be cleared up, misunderstandings which help explain why papers like this are being written at all. One is confusion between causation and causal inference; another is confusion between causal inference and particular methods of causal inference; and a third is a mix-up between fitting your methodological tool to your problem, and your problem to your tool.

The last point is particularly striking. What’s so interesting about these two papers (2008 & 2014) is that they seem to be trying to fit research problems to methods, not trying to develop methods to solve problems – even though this is ostensibly what they (at least VW&R 20114) are trying to do. To me, this is strongly reminiscent of Thomas Kuhn’s picture of science, according to which an “exemplary” bit of science occurs, and initiates a “paradigm”, which is a shared set of tools for solving “puzzles”. Kuhn was primarily influenced by physics, but this way of seeing things seems quite apt to explain what is otherwise, from the outside, really quite a remarkable, even bizarre about-turn. Age, sex, race – these are staple objects of epidemiological study as determinants of health; and they don’t fit easily into the potential outcomes paradigm. It’s fascinating to watch the subsequent negotiation. But I’m quite glad that it doesn’t look like epidemiologists are going to stop talking about these things any time soon.

References

Glymour C and Glymour MR. 2014. ‘Race and Sex Are Causes.’ Epidemiology 25 (4): 488-490.

Hernan M and Taubman S. 2008. ‘Does obesity shorten life? The importance of well-defined interventions to answer causal questions.’ International Journal of Obesity 32: S8–S14.

VanderWeele TJ and Robinson WR. 2014. ‘On the Causal Interpretation of Race in Regressions Adjusting for Confounding and Mediating Variables.’ Epidemiology 25(4): 473-484.

Snakes, statistics, and goals for the goal-setters

Cesar Victora gave a very interesting talk earlier today concerning the International Epidemiology Association’s position paper on the UN’s Sustainable Development Goals, which are currently being drafted (to replace the Millennium Development Goals post-2015). Victora is President of the IEA, for a few more hours at least (the new President takes office this evening). Many of his points were reiterated by the next speaker, Theodor Abelin, and in questions from the floor. There were no audible voices of dissent. (The talk reflects a fuller position paper, available here.)

The point that stayed with me most from Victora’s rich talk was the importance of relating goals to appropriate measurement techniques. My own interest in epidemiology has tended to focus on efforts to identify causes (“analytic” epidemiology), since causation is a natural magnet for philosophical interest. But measurement is also a focus of philosophical interest, and Victora nicely pointed out that “descriptive” epidemiology – the business of measuring things like maternal mortality rate, for example – is extremely important if these Sustainable Development Goals are to be effective. A country cannot be held to a goal that cannot be measured, and it cannot be fairly be held to a goal when progress towards that goal is estimated rather than measured.

For example, I was not surprised to learn that in many countries where maternal mortality is high, data on maternal mortality rates (MMRs) are scarce. What did surprise me was hearing about the calculations that some august international organisations perform in the absence of data. A calculation is performed involving GDP per capita, general fertility rate and skilled birth attendance. MMR is estimated as a function of these and perhaps some other similar variables. This means that if the country goes through a recession, the estimated MMR will automatically go up. – Perhaps is really will go up, but it seems strange to think of that calculation as a measurement, at least in the absence of extremely good evidence for the reliability of the estimating equation – evidence which, of course, we don’t have.

MMR is measurable, of course. The problem with MMR is simply a lack of data, and this problem afflicts a large class of conditions. As Victora put it in relation to snakebite: “Where we have snakes, we don’t have statistics, and where we have statistics, we don’t have snakes.”

However, Victora’s most penetrating critique of the SDGs concerned the setting of goals in the absence of clear ideas about how progress towards the goals will be measured. The health-related goal is as follows:

Goal 3. Ensure healthy lives and promote well-being for all at all ages” (from the Outcome Document)

This overarching goal is broken down into 13 subgoals, some of which are very loosely specified. For instance, how are we to tell whether a country has managed to “strengthen prevention and treatment of substance abuse, including narcotic drug abuse and harmful use of alcohol”? Ironically, those goals that are most clearly specified are wildly unattainable, such as halving global deaths and injuries from road traffic accidents by 2020. Those that are not well specified present measurement challenges for epidemiologists.

This made me wonder whether a body like the IEA could itself set some “goals for the goal-setters” – that is, criteria which any health-related goal must meet if, in the professional opinion of the IEA, they are to be useful. The simplest such criterion would be that outcomes must be specified in terms of a recognised epidemiological measure (mortality, for instance). Another might be to accompany each goal with information (perhaps in a corresponding entry in an appendix) concerning the trend over the past similar period: so if the goal is the halve road traffic deaths in 15 years, or 25, information on the growth of road traffic deaths over the past 15 or 25 years might be included. Goals of this kind will always be political, but there might be agreement on a set of simple rules for setting such goals, and if such rules existed, this might pull epidemiologists closer in to the goal-setting process – a kind of politicking which, as one of the questioners pointed out, is not part of standard epidemiological training.

 

Potential Outcomes: Separating Insight from Ideology

I’m in Anchorage, preparing for the World Congress of Epidemiology. One of the sessions I’m speaking at is a consultation for the next edition of the Dictionary of Epidemiology. It’s a strange and delightful document, this Dictionary: since it sets out to define not only individual words but also the discipline of epidemiology as a whole. Thus it contains both mundane and metaphysics entries, from “death certificate” to “causality”. I’m billed to talk about “Defining Measures of Causal Strength”. There’s a lot to say: the current entries under causal-related terms could use some disciplining. But I’m particularly interested in orienting myself with regards to the “potential outcomes” view of causation, which seems to be the current big thing among epidemiologists.

The potential outcomes view is associated in particular with Miguel Hernan, a very smart epidemiologist at Harvard, and he has a number of nice papers on it. (I hope I don’t need to say that what follows is not a personal attack: I have great respect for Hernan, and am stimulated by his work. I’m just taking his view as exemplary of the potential-outcomes approach, in the way that philosophers typically do.)

In particular I’ve been engaged in a close reading of a paper on obesity by Hernan and Taubman (2008). Their view, as expressed in that paper, is an interesting mix of pragmatism and idealism. On the one (pragmatic) hand, they argue that causal questions are often ill-formed, and thus unanswerable. There is no answer to the question “What is the effect of body-mass index (BMI) on all-cause mortality?” because the different ways to intervene on BMI may result in different effects on mortality. Diet, exercise, a combination of diet and exercise, smoking, chopping off a limb – these are all ways to reduce BMI. Until we have specified which intervention we have in mind, we cannot meaningfully quantify the contribution of BMI to mortality.

This much is highly reminiscent of contrastivist theories of causation in philosophy. Contrastivist theories take causation to consist in counterfactual dependence, but differ from counterfactual theories in taking the form of causal statements to be implicitly contrastive: not “c causes e” but “c rather than C* causes e rather than E*”, where C* and E* are classes of events that could occur in the absence of c and e respectively. Against this background, Hernan and Taubman’s point is simply that, for an epidemiological investigator, it matters what contrast class we have in mind when we seek to estimate the size of an effect. This is a good point, especially in a context where one hopes to act on a causal finding. One had better be sure that one knows, not only that there is a causal connection between a given exposure and outcome, but also what will happen if a given intervention replaces the factor under investigation. I have called the failure to appreciate this point The Causal Fallacy and linked it to easy errors in prediction (see this previous post and Broadbent 2013, 82).

But there is another more troubling side to the view as it is expressed in this paper: that randomized controlled trials offer a protection against this error, and somehow force us to specify our interventions precisely. The argument for this claim is striking, but on reflection I fear it is specious.

Hernan and Taubman make a striking point: they say that an observational study might appear to be able to answer the question “What is the effect of BMI on all-cause mortality?” via a statistical analysis of data on BMI and mortality, while randomized controlled trials would not be able to answer this question directly: they would only be able to answer questions like: “What is the effect of reducing BMI via dietary interventions? / via exercise? / via both?” This apparent shortcoming of RCTs is, of course, a strength in disguise: the observational study is in fact not so informative, since it does not distinguish the effects of different ways of reducing BMI; while the RCTs do give us this information.

This argument is fallacious, however, for the following reasons.

  1. An observational study that includes the same information as the RCTs on the methods of reducing BMI would also be able to distinguish between the effects of these interventions.
  2. It is true that one could conduct an observational study which ignored the possibility that different methods of reducing BMI might themselves have affect mortality. But that would be a bad study, since it would ignore the effects of known confounders. A good study would take these things into account.
  3. Conversely, it is a mistake to suppose that RCTs offer protection against this sort of error. The BMI case is a special one, precisely because there are so many ways to intervene to reduce BMI and we know that these could affect mortality. In truth, there are many ways to make any intervention. One may take a pill or a capsule or a suppository, on the equator or in the tropics, before or after a meal, and so on. Even in an RCT, the intervention is not fully specified. Rather, we simply assume that the differences don’t matter, or that if they do, they are “cancelled out” by the randomisation process.
  4. Randomized controlled trials are not controlled in the manner of true controlled experiments; rather, randomization is a surrogate for controlling. We hope that all the many differences between the circumstances of each intervention in the treatment group will either have no effect or, if they do, will have effects that are randomly distributed so as not to obscure the effect of the treatment. But in principle, it is still possible that this hope is not fulfilled. At a p-value of 0.05 this will happen in one RCT in 20; and perhaps more often in published RCTs, given publication bias (i.e. the fact that null results are harder to publish).

These are familiar points in the philosophical literature on randomised controlled trials (see esp. Worrall 2002). The point I wish to pull out is this. On the one hand, Hernan’s emphasis on getting a well-defined contrastive question is insightful and important. But on the other hand, it is wrong to think that RCTs solve the problem. True, in an RCT you must make an intervention. But it does not follow that one’s intervention is well-specified. There might be all sorts of features of the particular way that you intervene that could skew the results. And conversely, plug the corresponding “how it happened” info into a cohort study, and you will be able to obtain the same sorts of discrimination between these methods.

On top of all this, the focus on the methods of individual studies obscures the most important point of all: that convincing evidence comes from a multitude of studies. Just as an RCT allows us to assume that differences between individuals are evenly distributed and thus ignorable, so a multitude of methodologically inferior studies can provide very strong evidence if their methodological shortcomings are different. This is the kind of situation Hill responded to with his guidelines (NOT criteria!) for inferring causality (Hill 1965). Similarly, ad hoc arguments against each possible alternative explanation can add up to a compelling case, as in the classic paper by Cornfield and colleagues on smoking and lung cancer (Cornfield et al 1959). The recent insights of the potential outcomes approach are valuable and important, but they augment rather than replace these familiar, older insights.

References

Broadbent, A. 2013. Philosophy of Epidemiology. Basingstoke and New York: Palgrave Macmillan.

Cornfield J, Haenszel W, Hammond EC, Lilienfeld AM, Shimkin MB and Wynder EL. 1959. Smoking and lung cancer: recent evidence and a discussion of some questions. Journal of the National Cancer Institute 22: 173-203.

Hernan, MA and Taubman, SL. 2008. Does obesity shorten life? The importance of well-defined interventions to answer causal questions. International Journal of Obesity 32: S8-S14.

Hill, Austin Bradford. 1965. The environment and disease: association or causation? Proceedings of the Royal Society of Medicine 58: 259-300.

Worrall, J. 2002. What Evidence in Evidence-Based Medicine? The British Journal of the Philosophy of Science 58: 451-488.

 

Stability: an epidemiological ingredient in the realism debate?

I’m preparing a talk on stability for the New Thinking in Scientific Realism Conference that opens in Cape Town tomorrow. I introduced the notion of stability in my book, defined like this:

“A result, claim, theory, inference, or other scientific output is stable if and only if

(a) in fact, it is not soon contradicted by good scientific evidence; and

(b) given best current scientific knowledge, it would probably not be soon contradicted by good scientific evidence, if good research were done on the topic.” (Broadbent 2013, 63)

The introduction of this notion was a response to the perceived difficulties around “translating” epidemiological (or more generally biomedical) findings into good health policy. At Euroepi in Porto, 2012, I argued that translation was not the main or only difficulty for using epidemiological results, and that stability – or rather, the lack of it – was important. After all, one cannot comfortably rely on a result if one cannot be confident that the next study won’t completely contradict it, and that seems to happen pretty often in at least some areas of epidemiological investigation.

Thus the reasons for introducing the notion were thoroughly practical. More recently, though, I have been trying to tighten up the philosophical credentials of the notion, and that’s what I’m going to be talking about in Cape Town. Is stability epistemically significant? Can it be shown to be epistemically significant without collapsing into approximate truth? Can it be distinguished from approximate truth without collapsing into empirical adequacy? These are the questions I will seek to answer.

What’s interesting for me is that, as far as I can see, it’s pretty easy to answer these questions affirmatively. If I’m right about that, then this will be a nice case where studying actual science gives rise to new philosophical insights. The desire to make public health policy that will not have to be revised six months down the line is eminently practical; yet the proposal of a status that scientific hypotheses might have, distinct from truth and empirical adequacy and all the rest, is eminently abstract. If stability really is both defensible and novel, then it will illustrate the oft-repeated mantra that philosophers of science would benefit from looking more closely at science. I am personally put on guard when I hear that said, not because I disagree in principle, but because experience has taught me to suspect either lip service, or an excuse for poor philosophy. Perhaps I’m also guilty of one or both of these; I will be interested to see what Cape Town says.