Date: December 20-22, 2000
From: Bill Shipley, Universite de Sherbrooke, (Quebec) CANADA
Subject: Is the do(x) operator universal?
Bill Shipley asked:
In most experiments, the external
manipulation consists of adding (or subtracting) some amount from X
without removing pre-existing causes of X. For example, adding 5
kg/h of fertilizer to a field, adding 5 mg/l
of insulin to subjects
etc. Here, the pre-existing causes of the manipulated variable still
exert effects but a new variable (M) is added.
... The problem that I see with the do(x) operator as a general
operator of external manipulation is that it requires two things:
(1) removing any pre-existing causes of x and (2) setting
x to some value. This corresponds
to some types of external manipulations, but not all (or even most)
external manipulations. I would introduce an add(x=n)
operator, meaning "add, external to the pre-existing causal process,
an amount 'n' of x''.
Graphically, this consists of augmenting the pre-existing causal graph
with a new edge, namely M-n-->X. Algebraically, this would consist
of
adding a new term -n- as a cause of X.
Author's answer:
In many cases, your "additive intervention" represents
indeed the only way we can intervene on a variable X.
In fact, the general notion of intervention
(Causality, page 113) involves replacing the equation of
X by any other equation that fits the circumstances, not
necessarily a constant X = x.
What you are proposing corresponds to replacing the old equation of X, x = f(paX) by a new equation: x = f(paX) + 1 This replacement is usually treated under the heading "instrumental variables", since it is equivalent to writing x = f(paX) + I (where I is an instrument) and varying I from 0 to 1.
There are three points to notice: 1. The additive manipulation CAN be represented in the do( ) framework -- we merely apply the do( ) operator to the instrument I, and not to X itself. This is a different kind of manipulation that needs to be distinguished from do(x) because, as you noticed, the effect on y would be different.
2. Scientists working with instrumental variables (e.g., epidemiologists) are not satisfied with estimating the effect of the instrument on Y, but are trying hard to estimate the effect of X itself. The former is known as "the effect of intention to treat" the latter "the effect of treatment" (see Causality, page 261).
3. Consider the loopy example where LISREL fails y = bx +e1 + I, x = ay + e2. If we interpret "total effects" as the response of Y to a unit change of the instrument I, then LISREL's formula obtains: The effect of I on Y is b/(1-ab) However, if we adhere to the notion of "per unit change in X", as opposed to "per unit change in an instrument of X", we get back the do-formula. The effect of X on Y is b, not b/(1-ab), even though the manipulation is done through an instrument. In other words, we change I from 0 to 1 and observe the changes in X and in Y; if we divide the change in Y by the change in X, we get b, not b/(1-ab).
To summarize: Yes, additive manipulation is sometimes useful to model, normally it is done through instrumental variables, and we still need to distinguish between the effect of the instrument and the effect of X. The former is not stable (Causality, page 261) the latter is. Lisrel's formula corresponds to the effect of an instrument, not to the effect of X.
Bill Shipley further asked:
Thanks for the clarification. It seems to me that the simplest, and
most straight-forward, way of modeling and representing manipulations
of a causal system is to simply (1) modify the causal graph of the
unmanipulated system to represent the proposed manipulation, (2)
translate this new graph into structural equations, and (3) derive
predictions (including conditional predictions) from the resulting
equations; this is how I have treated the notion in my book. Why
worry about do(x) at all? In particular, one can model quite
sophisticated manipulations this way. For
instance, one might well ask what would happen if one added an
amount z to some variable x in the causal graph, in which z is
dependent on some other variable in the graph.
Author's Reply:
If the manipulation is sophisticated,
then we need to go back to the equations and specify
precisely what is being changed, how, with what instrument,
conditioned on what information, etc., and then impose the
appropriate modification on the model to account for these nuances.
(e.g., see example of "process control", page 74 of my book)
However, science thrives on standards, because standards serve (at least) two purposes: communication and theoretical focus.
Mathematicians, for example, have decided that the derivative operator "dy/dx", is a nice standard for communicating information about change, So, that is what we teach in calculus, although other operators might also serve the purpose, for example, x dy/dx or (dy/dx)/y etc.
1. Communication: If we were to eliminate the terms "treatment effect" from epidemiology, and replace it with detailed descriptions of how the effect was measured, we would practically choke all communication among epidemiologists. A standard was therefore established: what we measure in a controlled randomized experiment will be called "treatment effect", the rest will be considered variations on the theme. The "do-operator" represents standard faithfully.
The same goes for SEM. Sewall Wright talked about "effect coefficients" and established them as the standard of "direct effect" in path analysis (before it got molested with regressional jargon, and LISREL formulas). Again, the "do-operator" conforms directly to this standard.
2. Theoretical focus. Many (if not all) of the variants of manipulations can be reduced to "do", or to several applications of "do". Theoretical results established for "do" are then applicable to those variants. Examples: Les Hayduk's "poke and release" manipulation is expressible as "do" in the temporal unfolding of a structural model. Another example, questions of identification for expressions involving "do" are applicable to questions of identification of more sophisticated effects. On page 113 of Causality, I show that if the total effect P(y|do(x)) is identifiable, then so also is the effect of conditional actions P(y|do(x if Z=z)). The same goes for many other theoretical results in the book; they were developed for the "do" operator, they borrow from each other, and they are applicable to many variants.
Finally, the do operator is the appropriate operator for interpreting the conditional part of counterfactual sentences (see page 204) and counterfactuals are abundant in scientific discourse (see page 217-219). I have yet to see a competing candidate with comparable versatility, generality, formal power and (not the least) conceptual appeal. (Correction, I have yet to see ANY competing candidate.)