From: Eliezer S. Yudkowsky, Research Fellow, Singularity Institute for Artificial Intelligence, Santa Barbara, CA

Subject: The validity of

**Question to author:**

The following paragraph appears on p. 103, shortly after eq. 3.63 in
my copy of *Causality*:

"To place this result in the context of our analysis in this chapter, we note
that the class of semi-Markovian models satisfying assumption (3.62)
corresponds to complete DAGs in which all arrowheads pointing to *X _{k}* originate
from observed variables."

It looks to me like this is a sufficient, but not necessary, condition to
satisfy 3.62. It appears to me that the necessary condition is that no
confounder exist between any *X _{i}* and

It is also not necessary that the DAG be complete.

**Author reply:**

You are right that the DAG need not be complete,
and that the condition cited in p. 103 is sufficient
but not necessary for either

(3.62)

or the *G*-estimation
formula

(3.63)

to hold. Corrections
to the wordings of page 103 were posted on this website.

Your suggestion to allow confounding arcs beween
*X _{i}* and

In general, condition (3.62) is both over-restrictive and lacks
intuitive basis. A more general and intuitive condition leading to
(3.63) is formulated in (4.5) (*Causality*, p 122), which reads
as follows:

**(3.62*) General condition for g-estimation **

__Comment 1:__ The new definition leads to improvements over (3,62), namely,
there are cases where the *g*-formula (3.63) is valid with a subset
*L _{k}* of
the past but not with the entire past.

Assuming

(3.62) is also satisfied with the choice *L*_{1}=0, but not
with *L*_{1}=*Z*.

__Comment 2:__ Defining *L _{k}* as the set of
"nondescendants" of

__Example 2:__

with temporal order: *U*_{1},*X*_{1}, *S,Y*

Both (3.62) and (3.62*) are satisfied with *L*_{1} = *S*,
but not with *L*_{1} = 0.

__Comment 3:__ There are cases where (3.62) will not be satisfied even with the
new interpretation of *L _{k}*, but the graphical
condition (3.62*) is.

__Example 3:__ (constructed by Ilya Shpitser)

It is easy to see that (3.62*) is satisfied; all back-door
action-avoiding paths from *X*_{1} to *Y* are
blocked by *X*_{0}, *Z, Z'*.

At the same time, it is possible to show, though by a rather
intricate method (see the Twin Network Method, page 213) that
*Y _{}*{

(In the twin network model there is a *d*-connected path from
*X*_{1} to *Y _{x}*, as follows:

This example demonstrates one weakness of the Potential Response
approach initially taken by Robins in deriving (3.63). The counterfactual
condition (3.62) that legitimizes the use of the *g*-estimation
formula is void of intuitive support, hence, epidemiologists
who apply this formula are doing so under no guidance of substantive
medical knowledge. Fortunately, graphical methods are slowly making
their way into epidemiological practice, and more and more
people begin to understand the assumptions behind *g*-estimation.

(Warning: Those who currently reign causal analysis in statistics
are incurably graph-o-phobic and ruthlessly resist attempts to enlighten
their students, readers and co-workers with graphical methods. This
slows down progress in statistical research, but will eventually be overrun by
commonsense.)

Next discussion (CS262Z: Identifying conditional plans

Return to Discussions