What follows is an excerpt from a book I was writing on Causality in 2020. I eventually abandoned the manuscript as the software ecosystem was not mature enough to fold all of causality into 1-2 tools. Recently I took out the manuscript again, to share some basic insights with a colleague, and I realised that it would also make sense to share an extract here.
Normally when I teach a mathematical technique I don’t need to open with a debate about which technique is best. A lot of mathematical techniques have been in use for decades if not centuries and there is little debate about how and when they should be used. Nor, more importantly, is there a list of alternative methods which claim to solve the same problem but with different results. Causal inference is not quite at that point yet. This is a field going through a massive expansion, with wide-ranging debates about which methods are most appropriate. This is why most textbooks on causal inference begin with a philosophical treatise on cause-and-effect.
You cannot do causal inference until you decide on a method. And you cannot decide on a method until you define cause and effect. We are interested mainly in applications here, so I will avoid most of the philosophical discussion, but I will highlight what is new in current theories and why these changes are groundbreaking.
The fundamental problem of causal inference is that unit (or point) effects can rarely be observed in a data set. For example, if we want to analyse the causal effect that studying mathematics in university had on my life we don’t have an alternative timeline from my life with which to make comparisons. In some, rare, cases medical results can be generated using a method called Self-Controlled Case Study (SCCS) but this requires sufficient data from the individual before they received the intervention. This approach assumes that nothing else changes in the patients’ life and, more importantly, it does not allow us to examine what if style questions about things which did not happen.
As I have said elsewhere, there is no single theory of causality as yet, however the frontrunners gaining most attention, and responsible for the recent popularity of these approaches, are the strongly related potential outcomes framework from Donald Rubin (Rubin 1974), and other authors (Angrist et al. 1996; Holland 1986; Holland and Rubin 1983; Rosenbaum 2017; Rosenbaum and Rubin 1983), and the structural theory of Judea Pearl (Pearl 2009).
Potential outcomes (PO) theory came first and as such is firmly established in the field of epidemiology. A similar approach exists in economics where it is referred to under the name instrument variables and extensive use is made of structural equation modelling (SEM). The basic concept is that subjects are randomly assigned into treatment and control groups, where the treatment group are the ones who undergo the intervention of interest (the ’cause’ under investigation) and the control group do not. Certain conceptual properties must hold, such as the stable unit treatment value assumption (SUTVA). This means that the group membership is ’approximately’ interchangeable apart from the variable of interest.
One effect of particular interest is the average causal effect. This can be calculated by estimating (measuring) the mean response in each group. Since each individual only has a measured variable for what actually happened to it (treatment vs control) the mean group value can be inserted (as an instrument variable) for the missing value and thus the difference per individual between treatment and control can be estimated.
One notable distinction about PO theory is that the intervention under investigation must be theoretically possible. The framework specifically excludes questions which address otherworldly hypotheticals (eg. if I was taller – I’m not, so there is no PO application possible). This is an example of a bias towards solid real-world applications, which I mentioned in my introduction to epidemiology (in the book manuscript). It leads to consequences in the ’philosophy’ of how treatment groups are formed and what kind of instrument variables are acceptable. More than anything, potential outcomes theory is an attempt to work within the confines of statistical theory in order to answer questions of cause-and-effect.
Structural theory, in contrast, is built upon a schism from statistical theory. The history of mathematics is one of finding ’distinctions’, between theories that initially were treated as one, which lead to useful results (e.g. Spencer-Brown (1979)). Judea Pearl noted – and this is where the philosophy typically comes in – that there was the need to distinguish between variables which are observed and those same variables when they are manipulated by the experimenter. This one distinction marks structural theory apart from all of the causal inference methods which attempt to reconcile themselves with modern statistics.
Potential outcomes theory does not allow for the Pearl ’do’ distinction. However, if we allow for ’do’-ing then it is possible to map PO theory via SEMs to structural theory. By choosing this path, we are able to side-step many of the debates which have happened, over the past 30 years, about potentially conflicting approaches under the PO framework. By introducing ’do’-logic notation Pearl has shown that these conflicts disappear. This then is the path which we will take.
Structural Theory Counterfactuals
The Pearl school of causal inference lays a lot of emphasis on something called counterfactuals. This appears frustrating, at first, until the reason for it is explained carefully. For statisticians and epidemiologists, in particular, this seems like an arcane philosophical issue – one which, in particular, can never be proved! Even for readers open to this approach it can appear largely irrelevant and a distraction.
Counterfactuals are the examination of questions of the form, “What if?” when looking at past data. Expressed in a more readable example, “What would have happened if X had been different?” (American idiom not British English).
Scientifically speaking, we cannot roll-back time and re-run the experiment while somehow setting X to a different value (x′). Alternative methods of causal inference sometimes look at this issue in an entirely probabilistic framework. Essentially they look at the (often Monte-Carlo) alternative realities which the various distributional properties of the model might have generated and they then look at the ones where X = x′. This is a (mathematically) elegant and non-interventional approach. It does not conflict with existing mathematical theories.
Pearl’s approach takes a step-back mathematically. He introduces a distinction between probabilistic models and causal models. This distinction is essentially the only thing which separates Pearl’s causal inference from alternative approaches. However, the down-stream effects of this difference are enormous.
Sticking With Statistics
In this book we will focus on the structural theory approach to counterfactuals. In order to give you an understanding of how this approach differs from a statistical theory driven approach I will conclude this chapter with a number of examples (mostly taken from Pearl (2009)) in which statistical theory struggles to set a clear decision path for the data scientist.
…