CAUSALITY - Discussion (Megiddo)

CAUSALITY - Discussion (Megiddo) Date: April 24, 2000
From: Nimrod Megiddo (IBM Almaden)
Subject: Simpson's paradox and decision trees

I do not agree that "causality" is the key to resolving the paradox (but this is also a matter of definition) and that tools for looking at it did not exist twenty years ago. Coming from game theory, I think the issue is not difficult for people who like to draw decision trees with "decision" nodes distinguished from "chance" nodes.

I drew two such trees on the attached Word document which I think clarify the correct decision in different circumstances.
Click here for viewing the trees.

Author's reply:
The fact that you have constructed two different decision trees for the same input tables implies that the key to the construction was not in the data, but in some information you obtained from the story behind the data, What is that information?

The literature of decision tree analysis has indeed been in existence for at least fifty years but, to the best of my knowledge, it has not dealt seriously with the problem posed above: "what information we use to guide us into setting up the correct decision tree?"

We agree that giving a robot the frequency tables ALONE, would not be sufficient for the job. But what else would Mr. robot (or a statistician) need? Changing the story from F= "female" to F= "Blood pressure" seems to be enough for people, because people understand informally the distinct rolls that gender and blood pressure play in the scheme of things. Can we characterize these rolls formally, so that our robot would be able to construct the correct decision tree?

My proposal: give the robot (or a statistician or a decision-tree expert) a pair (T, G), where T is the set of frequency tables and G is a causal graph and, lo and behold, the robot would be able to set up the correct decision tree automatically. This is what I meant by saying that the resolution of the paradox lies in causal considerations. Moreover, one can go further and argue: "if the information in (T, G) is sufficient, why not skip the construction of a decision tree altogether, and get the right answer directly from (T, G)?" This is the gist of chapters 3-4 in the book, which can be a topic for a separate discussion: Would the rich literature on decision tree analysis benefit from conversion to the more economical encoding of decision problems in the syntax of (T, G)? The introduction of influence diagrams (in 1981) was a step in this direction and, as Section 4.1.2 indicates, the second step might not be too far off.

From: Nimrod Megiddo (IBM Almaden)
Subject: Simpson's paradox and decision trees (cont.)

My point remains simply the following. The term "causality" introduces into the problem issues that do not have to be there, such as determinism, free will, cause and effect, etc. What does matter is a specification that, in the outcome fixing process, fixing the value of variable X occurs before fixing the value of a variable Y, and Y depends on X. You like to call this situation a causality relation. Of course in a mathematical theory you can choose any name you like, but then people are initially tempted to develop some intuition, which may be wrong due to the external meaning of the name you choose. The interpretation of this intuition outside the mathematical model often has real-life implications that may be wrong, for example, that X really causes Y. The decision tree is simple a way to demonstrate the additional chronological information, and simple directed graphs can of course encode that information more concisely. When you have to define precisely what these graphs mean, you refer to a fuller description like the trees.

So, in summary, my only objection is to the use of the word "causality" and I never had doubts that chronological order information was crucial to a correct decision making based on past data.

Author reply

I agree that there is some danger in attributing to certain mathematical relationships labels such as "causal", which are loaded with intuition and controversy. However, if in addition to getting the mathematics right, one is also interested in explicating those valuable intuitions, so that we can interpret them more precisely and even teach them to robots, then there is no escape but to label those relationships with whatever names they currently enjoy in our language, namely, "causal". And, BTW, the real-life implications of this exercise are not wrong -- X really causes Y when the formal conditions hold; I would gladly retract this statement at the sight of the first counterexample.
There is more that enters a decision tree than chronological and dependence information. For example, the chronological and dependence information that is conveyed by Figure 6.2(c) is identical to that of Figure 6.2(a) (assuming F occurs before C), yet (c) calls for a different decision tree (and yields a different conclusion), because the dependence between F and Y is "causal" in (a) and statistical in (c). Thus, causal considerations must supplement chronological and dependence information if we are to construct correct decision trees and to load their branches with correct probabilities.
As a thought experiment, imagine that we wish to write a program that automatically constructs decision trees from stories like those in Fig 6.2(a)-(b)-(c). The program is given the empirical frequency tables and is allowed to ask us questions about chronological and dependence relationships among C, E and F, but is not allowed to use any causal vocabulary. Would the program be able to distinguish between (a) and (c)? Note that all statistical-dependence information can be obtained from the frequency tables and, moreover, dependence information relative to manipulating the control variable (which Section 1.5 defines as "causal" information) would not, in itself, be sufficient. See Section 6.3 for discussion of why the program will fail.
I do not agree that, to define precisely what causal graphs mean we must "refer to a fuller description like the trees". (A similar position has been advocated by Glenn Shafer.) In Section 7.1 we find a formal definition of causal graphs as a collection of functions, and this definition invokes no decision trees (at least not explicitly). Thus, a causal graph has meaning of its own, independent of the many decision trees that the graph may help us construct. By analogy, it would be awkward (though not mathematically wrong) to say that the meaning of a differential equation (say, for particle motion) lies in the set of trajectories that obey that equation; we can discuss the meaning and adequacy of each term in the differential equation from first principles, without having a clue of the solution to the equation.

Next Discussion (Kenny: Causality and the mystical error terms)