I have a hard time understanding what counterfactuals are actually useful for. To me, they seem to be answering the wrong question. In your book, you give at least a couple of different reasons for when one would need the answer to a counterfactual question, so let me tackle these separately:
A further example is that on page 323 of your book: the desert traveler. Surely, both Enemy-1 and Enemy-2 are equally 'guilty' for trying to murder the traveler. Attempted murder should equal murder. In my mind, the only rationale for giving a shorter sentence for attempted murder is that the defendant is apparently not so good at murdering people so it is not so important to lock him away... (?!)
In decision making, the things we want to estimate is P(future | do(action), see(context) ). This is of course a regular do-probability, not a counterfactual query. So why do we need to compute counterfactuals?
In your example in section 7.2.1, your query (3): "Given that the current price is P=p_{0}, what would be the expected value of the demand Q if we were to control the price at P=p_{1}?". You argue that this is counterfactual. But what if we introduce into the graph new variables Qtomorrow and Ptomorrow, with parent sets (U_{1}, I, Ptomorrow) and (W,U)2,Qtomorrow), respectively, and with the same connection-strengths d_{1}, d_{2}, b_{2}, and b_{1}. Now query (3) reads: "Given that we observe P=p_{0}, what would be the expected value of the demand Qtomorrow if we perform the action do(Ptomorrow=p_{1})?" This is the same exact question but it is not counterfactual, it is just P(Qtomorrow | do(Ptomorrow=p_{1}), see(P=P_{0})). Obviously, we get the correct answer by doing the counterfactual analysis, but the question per se is no longer counterfactual and can be computed using regular do( )-machinery. I guess this is the idea of your 'twin network' method of computing counterfactuals. In this case, why say that we are computing a counterfactual when what we really want is prediction (i.e. a regular do-expression)?
To put it in the most simplified form, my argument is the following: Regardless of if we represent individuals, businesses, organizations, or government, we are constantly faced with decisions of how to act (and these are the only decisions we have!). What we want to know is, what will likely happen if we act in particular ways. So we want to know is P(future | do(action), see(context) ). We do not want nor need the answers to counterfactuals.
Where does my reasoning go wrong?
Author reply
"In decision making, the things we want to estimate is P(future | do(action), see(context)). This is of course a regular do-probability, not a counterfactual query. So why do we need to compute counterfactuals?"
The answer is that, in certain cases, the variables entering into "context" are CONSEQUENCES of the "action", and the expression P(y|do(x), z) is defined as the probability of y given that we do X=x and LATER observe Z=z, which is not the probability of y given that we first observe Z=z and then do X=x.
This confusion disappears of course when we have a sequential, time-indexed model. But, working with static models as in my book, we we we do not have the language to express the probability P of Y=y given that we first observe Z=z and then do X=x. Counterfactuals give us a way of expressing this probability, by writing
I have elaborated on this point in
Pearl, J., ``The logic of counterfactuals in causal inference
(Discussion of `Causal inference without counterfactuals' by A.P. Dawid),''
Journal of American Statistical Association, Vol. 95, No. 450,
428--435, June 2000.
Thanks for your illuminating questions. I hope that they, together with my attempted answers will help other readers with similar difficulties.
Best wishes,
========Judea Pearl
Date: February 22, 2006
From: Dr. Patrik Hoyer (University of Helsinki, Finland)
Subject: The meaning of counterfactuals
On Question 1
My view on your example is that if the drug company followed
the law in doing all the required tests, not "cutting corners," not
silencing early results showing strange results, not hiding
information, etc etc, then I would not consider the company responsible
for the death. Rather, I would consider it an accident, similar to the
damage cause by an earthquake, a tornado, or a car-crash due to an elk
crossing the street. Nobody would be legally responsible for the death,
but of course the family could get insurance money from a private (or
public-sector) insurance against accidental death.
Of course, had the company deliberatly tried to silence strange test results, or not done all the tests required, or in some other way broken the law for how medicines should be developed, then the company (and in particular the people responsible for the practice within the company) should be held responsible.
On question 2
My notion of causality is strongly tied to the notion of time, so I
have a hard time with your explanation.
First, isn't a "sequential, time-indexed model" really what we would like? At least it fits nicely with my intuition about causality; much better than any 'static' model. So, if counterfactuals are not needed in such a model then in my mind they are not needed at all...
Second, again my intuition of causality is so strongly connected to time that I can't understand how one can first observe Z and then do X if Z is a descendant of X. If this is physically possible then I would call the new controlled variablel X' and then of course Z is not a descendant of X' (since Z happens before X') and again we can get by with regular do-probabilities.
Author Reply On Question 2
A static model IS a short-hand notation for a
"sequential, time-indexed model".
When an engineer draws a circuit diagram, he is building a
static model, which saves miles and miles of drawing the
sequential model equivalent.
The meaning of the static model is
where M_{i} stand for the model at time t_{i},
and M_{1} = M_{2} = ....M_{n}
Now, suppose in M_{2} we have a chain of gates we observe Z_{3} at time t_{}3 and we want to know the causal effect of X_{}4 on Y_{5} . We can do this exercise through do-calculus, with all the necessary indices and the replicated models. But we can do it much nicer in the static model, using counterfactuals. P(Y_{x} = y | z) will give us the correct answer.
Isn't it a nice invention??
You say:
So, if counterfactuals are not needed in such a model then in my mind they are not needed at all...
Next discussion (Sjolander: d-separation of counterfactuals)
Return to Discussions