CAUSALITY - Discussion (Pearl (2))

CAUSALITY - Discussion (Pearl (2)) Date: February 25, 2007
From: Judea Pearl (UCLA)
Subject: Counterfactuals in linear systems

Question to author:
What do we know about counterfactuals in linear models?

Author's reply
Glad you asked.

Here is a neat result concerning the testability of counterfactuals in linear systems.
We know that counterfactual queries of the form P(Y_x=y|e) may or may not be empirically identifiable, even in experimental studies. For example, the probability of causation, P(Y_x=y|x',y') is in general not identifiable from experimental data (Causality, p. 290, Corollary 9.2.12) when X and Y are binary.¹ (Footnote-1: A complete graphical criterion for distinguishing testable from nontestable counterfactuals is given in Shpitzer and Pearl (2007, upcoming)).

This note shows that things are much friendlier in linear analysis:

Claim A. Any counterfactual query of the form E(Y_x |e) is empirically identifiable in linear causal models, with e an arbitrary evidence.

Claim B. E(Y_x|e) is given by

Claim A is not surprising. It has been established in generality by Balke and Pearl (1994b) where expressions involving the covariance matrix were used for the various terms in (1).

Claim B offers an intuitively compelling interpretation of (1) that reads as follows: Given evidence e, to calculate E(Y_x |e), (i.e., the expectation of Y under the hypothetical assumption that X were x, rather than its current value), first calculate the best estimate of Y conditioned on the evidence e, E(Y|e), then add to it whatever change is expected in Y when X undergoes a forced increase from its current best estimate, E(X|e), to its hypothetical value X=x. That last addition is none other but the effect coefficient T, times the expected change in X, i.e., T[x - E(X|e)]

Note: Eq. (1) can also be written in do(x) notation as

E(Y_x|e) = E(Y|e) + E(Y|do(x)) - E[Y|do(X=E(X|e))] (1')

Proof:
(with help from Ilya Shpitzer)

Assume, without loss of generality, that we are dealing with a zero-mean model. Since the model is linear, we can write the relation between X and Y as:

Y = TX + I + U (3) where T is the total effect of X on Y, given in (2), I represents terms containing other variables in the model, nondescendants of X, and U representing exogenous variables.

It is always possible to bring the function determining Y into the form (3) by recursively substituting the functions for each rhs variable that has X as an ancestor, and grouping all the X terms together to form TX. Clearly, T is the Wright-rule sum of the path costs originating from X and ending in Y (Wright, 1921).

From (3) we can write:

Y_x = Tx + I + U (4) since I and U are not affected by hypothetical change from X=x and, moreover, E(Y_x|e) = Tx + E(I+U|e) (5) since x is a constant.

The last term in (5) can be evaluated by taking expectations on both sides of (3), giving:

E(I+U|x) = EY|e) - TE(X|e) (6)
and, substituted into (5), yields E(Y_x|e) = Tx + E(Y|e) - E(X|e) (7)
and proves our target formula (1).
-------------------- QED

Some Familiar Problems Cast in Linear Outfits
Three Special cases of e are worth noting:
Example-1. e: X =x', Y = y'
(The linear equivalent of the probability of causation) From (1) we obtain directly

E(Y_x|Y=y', X=x') = y' + T (x - x')

This is intuitively compelling. The hypothetical expectation of Y is simply the observed value of Y, y', plus the anticipated change in Y due to the change x-x' in X.

Example-2. e: X = x' (effect of treatment on treated)

E(Y_x|X=x') = E(Y|x') + T (x - x')
= rx' + T (x - x')
= rx' + E(Y|do(x)) - E(Y|do(x')) where r is the regression coefficient of Y on X.

Example-3. e; Y = y'
(Gee, my temperature is Y=y', what if I had taken x tablets of aspirin. How many did you take? Don't remember.)

E(Y_x |Y=y') = y' + T [x - E(X|y')]
= y' + E(Y|do(x)) - E[Y|do(X=r'y')]
where r' is the regression coefficient of X on Y.
Example-4. Let us consider the non-recursive, supply-demand model of page 215 in Causality (2000). Eqs. (7.9)-(7.10) read: q = b₁p + d₁i +u₁
p = b₂q + d₂w +u₂

Our counterfactual problem (page 216) reads: Given that the current price is P=p₀, what would be the expected value of the demand Q if we were to control the price at P = p₁? Making the correspondence P = X, Q = Y, e = {P=p₀, i, w}, we see that this problem is identical to Example 2 above (effect of treatment on the treated), subject to conditioning on i and w. Hence, since T = b₁, we can immediately write

E(Q_p₁ | p₀, i, w) = E(Y|p₀,i,w) + b₁(p₁ - p₀)
= r_p p₀ + r_i i + r_w w + b₁(p₁-p₀) (8)
where r_p, r_i and r_w are the coefficient of P, i and w, respectively, in the regression of Q on P, i and w.

Eq. (8) replaces Eq. (7.17) on page (217). Note that the parameters of the price equation

p = b₂q + d₂w +u₂ only enter (8) via the regression coefficients. Thus, they need not be calculated explicitly in case the are estimated directly by least square.

Remark 1:
Example 1 is not really surprising; we know that the probability of causation is empirically identifiable under the assumption of monotonicity (Causality, p. 293). But examples 2 and 3 trigger the following conjecture:

Conjecture
Any counterfactual query of the form P(Y_x |e) is empirically identifiable when Y is monotonic relative to X.

It is good to end on a challenging note.

Return to Discussions