Models, identifiability, and estimability in causal inference

(1)

Models, identifiability, and estimability in causal inference

Oliver J. Maclaren¹ Ruanui Nicholson¹

Abstract

Here we discuss two common but, in our view, misguided assumptions in causal inference. The first assumption is that one requires potential outcomes, directed acyclic graphs (DAGs), or structural causal models (SCMs) for thinking about causal inference in statistics. The second is that identifiability of a quantity implies estimability of that quantity. These views are not universal, but we believe they are sufficiently common to warrant comment.

1. Overview

The focus of this extended abstract is two common but, in our view, misguided assumptions in causal inference. While these assumptions are not universal, and causal inference is diverse and multidisciplinary, we believe explicit discussion of them is worthwhile. The first assumption concerns the role and meaning ofmodelsin causal inference. It is common to assume that causal inference in statistics necessarily requires special causal modelling formalisms such as potential outcomes, directed acyclic graphs (DAGs), or structural causal models (SCMs). The second assumption concerns the relationship betweenidentifiabilityandestimability.

Formal logics of causal inference often take identifiability of a quantity to imply its statistical estimability, then giving identification primary importance. Here estimability means, intuitively, that statistical estimation with finite error guarantees is possible. Maclaren & Nicholson(2019) give a detailed background and analysis of the above assumptions and explain why they are misguided. The present work gives a condensed overview of their article.

1.1. Causal models and statistical frameworks

The first assumption above is closely related to how the term ‘model’ should be understood in causal inference and

1Department of Engineering Science, The University of Auck- land, Auckland, New Zealand. Correspondence to: Oliver J. Ma- claren<[email protected]>.

Workshop on the Neglected Assumptions in Causal Inference (NACI) at the38^thInternational Conference on Machine Learning, 2021

statistics. For example, is a model a single probability distribution, a family of distributions, a ‘generative mechanism’, or a set of structural equations? Or something else? A more general, informal definition of ‘model’ is simply: ‘theoret- ical construct that implies distributions over observables’.

Starting from this perspective,Maclaren & Nicholson(2019) translate a standard DAG/SCM causal inference framework into an abstract statistical framework. In (1), we give a high- level view of this translation, with the left-hand side based onPearl & Bareinboim(2014), and the right-hand side a further abstracted version of the statistical framework for inverse problems given byEvans & Stark(2002):

M $ ⇥

M1, M22M $ ✓1,✓22⇥ Q(M) $ q(✓)

P :M!P, M 7!P(M) $ P :⇥!P, ✓7!P(✓).

(1) In the above, structural causal models in the sense ofPearl

& Bareinboim(2014), symbolised byM1, M2, correspond to abstract models or ‘theories’✓1,✓2; the causal class of Pearl & Bareinboim(2014),M, corresponds to the abstract model space⇥to which✓1,✓2belong, and causal queries Q(M)correspond to (interest) parameters or ‘queries’q(✓).

The functionP on the left, which maps any fully-specified structural causal modelM to its probability distribution P(M), is translated as the so-called ‘forward mapping’P in the abstract framework.

Bothinterventional andcounterfactualconcepts can be expressed asinterest parametersin the above abstract statistical framework. Importantly, these are defined as functions or functionals on a basic ‘model space’,rather than the space of distributions. This translation is fully compati- ble with specific causal modelling frameworks like SCMs or DAGs but also expands the scope of causal inference to include model types often neglected in the causal inference literature, for example differential equations, agent-based models, or continuous-time stochastic process models.

1.2. Identifiability and estimability

The second assumption arises from a common idea in the formal causal inference literature (e.g.Pearl & Bareinboim, 2014, and references therein). This idea is that there is a natural separation of concerns between causal inference and

(2)

Models, identifiability, and estimability in causal inference

statistical estimation, giving a division of labour of the form:

• First determine what can be estimatedin principlefrom data using formal logics of identifiability analysis.

• Then use statistical theory to help design aparticular efficient (or otherwise desirable) estimator.

Maclaren & Nicholson(2019) re-analyse this purported dividing line between statistical and causal questions from first principles, focusing on a careful analysis of the concept ofidentifiability. Authors in the formal causal inference literature sometimes take identifiability of a quantity as synonymous with ‘estimable from data’. For example,Pearl

& Bareinboim(2014, p. 583) give the following definition and description (emphasis ours):

The following definitioncaptures the require- ment thatQbe estimable from the data:

Causal queryQ(M)is identifiable, given a set of assumptionsA, if for any two (fully-specified) models,M1andM2, that satisfyA, we have

P(M1) =P(M2) =) Q(M1) =Q(M2).

In the above,P(M1), P(M2)denote the probability distri- butionsimplied bythe causal modelsM1, M2, rather than probabilitiesofthe models.

In their re-analysis of the identifiability concept,Maclaren &

Nicholson(2019) take a relatively abstract approach, using methods from elementary category theory. They explic- itly prove several intuitive results about identifiability from first principles. These results apply to models expressed as DAGs and SCMs but, because of the general setting, also apply to model types such as differential equations or agent- based models. They use this general framework to formally prove that identifiability of the forward mapping implies the existence of Fisher-consistent functionals defined on distributions. These results include an equivalence between identifiability of general queries (including interventional or counterfactual queries) and the existence of corresponding functionals of distributions over observable quantities.

At the same time,Maclaren & Nicholson(2019) show that these functionals and associated estimators lackstability andstatistical guarantees. They review existing results to this effect from statistics, going back to at least the 1950s, e.g.Bahadur & Savage(1956), also including more recent results by influential researchers in causal inference such asRobins et al.(2003). They then provide simple exam- ples, taking the perspective of ill-posed inverse problems, where causal quantities are identified but inestimable (any estimator having arbitrarily bad statistical properties). This illustrates that identifiability is an inadequate, or at least

incomplete, characterisation of the more general concept of

‘can be estimated from data’.

Figure 1shows a simple example taken from Maclaren

& Nicholson(2019), where estimating the causal quantity p(y|do(x))reduces to an identified but ill-posed (arbitrarily unstable) problem of estimating a conditional probability densityf from the cumulative distributionWhat can be estimated? F.

1 1

2 f

y 1

1 F

y

Figure 5: Illustration of Example 5, i.e. solving an ill-posed integral equationKf=F. Small perturbations to the right-hand sideFcan give large changes to the solution f.

From this, we see that as !0, i.e.n! , we have that the perturbed right-hand side approaches the unperturbed right-hand side, but the solution to the perturbed equation does not approach the solution to the unperturbed equation. This example is illustrated in Figure 5.

The above example demonstrates the ill-posedness of the problem of estimatingp(y|x), and hencep(y|do(x)) in this case. Of course, ill-posedness can be addressed via regularisa- tion methods, but this amounts to requiring additional restrictions on the model space in which solutions are sought, i.e. restrictions on the causal questions and answers. This is true evendespite identifiability, i.e. the solution can be unique but arbitrarily unstable.

7.1.2 Regression

Regression models can also be written directly as solutions to Fredholm integral equations in a similar manner to the above, or obtained by first estimating the conditional distribution function (Vapnik, 2013; Vapnik and Izmailov, 2015). Here we simply directly consider the regression function, for fixedx, as a functional of the conditional cumulative distribution function. This makes the arguments of Tibshirani and Wasserman (1988); Huber and Ronchetti (2011); Hampel et al. (2011); Hampel (1971) concerning sensitive functionals and robust statistics directly applicable.

We again emphasise that, in general, regression functions are not equal to what is sometimes calledcausal regression functions(Wasserman, 2013). This latter function captures the ‘response’Y to ‘treatment’X =xin the continuous setting. WhenXis randomly assigned or when e.g. the DAG in Figure 7.1.1 holds, the statistical and causal regression functions are numerically equal, however. For simplicity, we will again assume this relationship holds. Hence we have identifiability, but need to further consider stability in order to assess estimability.

35

Figure 1.Illustration of an example ill-posed integral equation of the formKf =F, taken fromMaclaren & Nicholson(2019).

Small perturbations to the right-hand sideFcan give large changes to the solutionf.Maclaren & Nicholson(2019) give an example of estimatingp(y|do(x))which has this form. This causal inference task is ill-posed (unstable), even given identifiability.

2. Discussion and conclusions

Here we have sketched key results contained inMaclaren

& Nicholson(2019). A general lesson is that embracing an abstract statistical formalism can enable a broader and clearer perspective on the meaning of ‘models’ in causal inference and statistics. This perspective also opens the door to collaborations with many scientists, engineers, and applied mathematicians using classes of models other than SCMs or DAGs. A second lesson is that focusing on pure identification results without establishing associated estimability results can lead to misleading conclusions about what is achievable in practice.

Acknowledgements

OJM received support from the University of Auckland, Faculty of Engineering James and Hazel D. Lord Emerging Faculty Fellowship. OJM would also like to thank #statstwit- ter for numerous helpful conversations and encouragement for this line of work.

References

Bahadur, R. R. and Savage, L. J. The nonexistence of certain statistical procedures in nonparametric problems. Ann.

Math. Stat., 27(4):1115–1122, 1956.

Evans, S. N. and Stark, P. B. Inverse problems as statistics.

Inverse Probl., 18(4):R55, 2002.

(3)

Models, identifiability, and estimability in causal inference Maclaren, O. J. and Nicholson, R. What can be estimated?

identifiability, estimability, causal inference and ill-posed inverse problems.arXiv prepint, arXiv:1904.02826, 2019.

Pearl, J. and Bareinboim, E. External validity: From do- calculus to transportability across populations. Stat. Sci., 29(4):579–595, 2014.

Robins, J. M., Scheines, R., Spirtes, P., and Wasserman, L.

Uniform consistency in causal inference. Biometrika, 90 (3):491–515, 2003.

View publication stats