Question to author:
If your assumption, that controlling X at x is equivalent to
removing the function for X and putting X=x elsewhere,
is applicable, then it makes sense because, from my
last paragraph, we need past information to select the correct function.
What I do not understand at the moment is the relevance of this to decision
trees. At a decision node, one conditions on the quantities known at the
time of the decision. At a random node, one includes all relevant uncertain
quantities under known conditions. Nothing more than the joint distributions
(and utility considerations) are needed. For example, in the medical case,
the confounding factor may either be known or not at the time the decision
about treatment is made, and this determines the structure of the tree.
Where causation may enter is when the data are used to assess the
probabilities needed in the tree, and it is here that Novick and I used
exchangeability. The Bayesian paradigm makes a sharp distinction between
probability as belief and probability as frequency, calling the latter,
chance. If I understand causation, it would be reasonable that
our concept could conveniently be replaced by yours in this context.
Author's reply
Many decision analysts take the position that causality
is not needed because: "Nothing more than the joint distributions
(and utility considerations) are needed." (see discussion
with Nimrod Megiddo posted on this page). I certainly agree that
joint distributions is all that is needed, because
P(y|do(x)) is indeed a well defined distribution
function, and this distribution is the target of
causal analysis. What is special about this distribution,
however, is that it is not derivable from the joint
distribution P(y,x), unless we add causal knowledge,
such as the one provided by a causal graph.
Your next sentence says it all:
Where causation may enter is when the data are used to assess
the probabilities needed in the tree,...
This is precisely the way I think about causation, perhaps more daringly. I would not restrict myself to "data" but will include "beliefs", results of various experiments, and even plain scientific knowledge (Ohm's law, Newtonian mechanics, etc.)
I take the frequency/belief distinction to be tangential to discussions of causality. Let us assume that the tables in Simpson's story were not frequency tables, but summaries of one's subjective beliefs about the occurrence of various joint events, (C,E,F),(C,E,-F)... etc. My assertion remains that this summary of beliefs is not sufficient for constructing our decision tree. We need also to assess our belief in the hypothetical event "E would occur if a decision do(C) is taken" and, as I have emphasized (and demonstrated), temporal information alone is insufficient for deriving this assessment from the tabulated belief summaries, hence, we cannot construct the decision tree from this belief summary; We need an extra ingredient, which I call "causal" information and you choose to call "exchangeability" -- I would not quarrel about nomenclature.
Next discussion (Lindley (3): On causality and decision trees (cont) )