From: Nimrod Megiddo (IBM Almaden)

Subject: Simpson's paradox and decision trees

I do not agree that "causality" is the key to resolving the paradox (but this is also a matter of definition) and that tools for looking at it did not exist twenty years ago. Coming from game theory, I think the issue is not difficult for people who like to draw decision trees with "decision" nodes distinguished from "chance" nodes.

I drew two such trees on the attached Word document which I
think clarify the correct decision in different circumstances.

Click here for viewing the trees.

**Author's reply:**

The fact that you have constructed two different
decision trees for the same input tables implies that
the key to the construction was not in the data, but
in some information you obtained from the story
behind the data,
What is that information?

The literature of decision tree analysis has indeed been in existence for at least fifty years but, to the best of my knowledge, it has not dealt seriously with the problem posed above: "what information we use to guide us into setting up the correct decision tree?"

We agree that giving a robot
the frequency tables ALONE, would not be sufficient
for the job. But what else would
Mr. robot (or a statistician) need? Changing the story from
*F*= "female" to *F=* "Blood pressure" seems to be enough
for people, because people understand informally the distinct
rolls that gender and blood pressure play in the scheme of things.
Can we characterize these rolls formally,
so that our robot would be able to construct the correct
decision tree?

My proposal: give the robot (or a statistician or a
decision-tree expert) a pair (*T, G*), where *T* is the
set of frequency tables and *G* is a causal graph and,
lo and behold, the robot would be able to set up the
correct decision tree automatically.
This is what I meant by saying that the resolution of
the paradox lies in causal considerations.
Moreover, one can go further and argue: "if the information in
(*T, G*) is sufficient, why not
skip the construction of a decision tree altogether,
and get the right answer directly from (*T, G*)?"
This is the gist of chapters 3-4 in the book,
which can be a topic for a separate
discussion: Would the rich literature on decision tree
analysis benefit from conversion to the more economical
encoding of decision problems in the syntax of (*T, G*)?
The introduction of influence diagrams (in 1981) was a step
in this direction and, as Section 4.1.2 indicates, the second
step might not be too far off.

**From: Nimrod Megiddo (IBM Almaden)
Subject: Simpson's paradox and decision trees (cont.)**

My point remains simply the following. The
term "causality" introduces into the problem issues that
do not have to be there, such as determinism, free will,
cause and effect, etc. What does matter is a specification
that, in the outcome fixing process, fixing the value of
variable *X* occurs before fixing the value of a variable *Y*, and
*Y* depends on *X*. You like to call this situation a causality
relation. Of course in a mathematical theory you can choose
any name you like, but then people are initially tempted to
develop some intuition, which may be wrong due to the external
meaning of the name you choose. The interpretation of this
intuition outside the mathematical model often has real-life
implications that may be wrong, for example, that *X* really
causes *Y*. The decision tree is simple a way to demonstrate
the additional chronological information, and simple directed
graphs can of course encode that information more concisely.
When you have to define precisely what these graphs mean, you
refer to a fuller description like the trees.

So, in summary, my only objection is to the use of the word "causality" and I never had doubts that chronological order information was crucial to a correct decision making based on past data.

**Author reply**

- I agree that there is some danger in attributing to
certain mathematical relationships labels such as
"causal", which are loaded with intuition and controversy.
However, if in addition to getting the mathematics
right, one is also interested in explicating those
valuable intuitions, so that we can interpret them more
precisely and even teach them to robots, then there is no escape
but to label those relationships with whatever names they
currently enjoy in our language, namely, "causal".
And, BTW, the real-life implications of this exercise are
not wrong --
*X*really causes*Y*when the formal conditions hold; I would gladly retract this statement at the sight of the first counterexample. -
There is more that enters a decision tree than chronological
and dependence information. For example, the chronological and
dependence information that is conveyed by
Figure 6.2(c) is identical
to that of Figure 6.2(a) (assuming
*F*occurs before*C*), yet (c) calls for a different decision tree (and yields a different conclusion), because the dependence between*F*and*Y*is "causal" in (a) and statistical in (c). Thus, causal considerations must supplement chronological and dependence information if we are to construct correct decision trees and to load their branches with correct probabilities.As a thought experiment, imagine that we wish to write a program that automatically constructs decision trees from stories like those in Fig 6.2(a)-(b)-(c). The program is given the empirical frequency tables and is allowed to ask us questions about chronological and dependence relationships among

*C, E*and*F,*but is not allowed to use any causal vocabulary. Would the program be able to distinguish between (a) and (c)? Note that all statistical-dependence information can be obtained from the frequency tables and, moreover, dependence information relative to manipulating the control variable (which Section 1.5 defines as "causal" information) would not, in itself, be sufficient. See Section 6.3 for discussion of why the program will fail. - I do not agree that, to define precisely what causal graphs mean we must "refer to a fuller description like the trees". (A similar position has been advocated by Glenn Shafer.) In Section 7.1 we find a formal definition of causal graphs as a collection of functions, and this definition invokes no decision trees (at least not explicitly). Thus, a causal graph has meaning of its own, independent of the many decision trees that the graph may help us construct. By analogy, it would be awkward (though not mathematically wrong) to say that the meaning of a differential equation (say, for particle motion) lies in the set of trajectories that obey that equation; we can discuss the meaning and adequacy of each term in the differential equation from first principles, without having a clue of the solution to the equation.

Next Discussion (Kenny: * Causality and
the mystical error terms*)