3 Comments
User's avatar
Brian H Mathison PhD's avatar

Rarely comment, but really appreciated this post and a digest of the work of Wysocki et al. (2022), which offers a well-articulated critique of statistical control and its limitations when divorced from causal justification. It echoes the foundational arguments made by Judea Pearl—particularly in The Book of Why—against the overreliance on control groups and purely statistical associations for causal inference. While control groups can reveal associations, they often fail to address hidden confounders and cannot, on their own, establish causality.

Pearl’s “ladder of causation” that encompasses associations, interventions, and counterfactual reasoning, emphasizes that causal understanding requires structured assumptions and formal tools. Causal diagrams (DAGs) and the back-door criterion provide a framework for identifying which variables must be adjusted for to estimate causal effects in observational data. Wysocki et al. effectively highlight this point [citing Pearl in 4 instances], cautioning against common missteps, such as adjusting for colliders or mediators, which can introduce bias or obscure causal pathways. Their discussion of proxy variables is similarly nuanced, recognizing both their potential and their pitfalls.

While the share post and discussion draws heavily from Wysocki et al., it’s essential to recognize the broader context: the sophistication of modern causal inference should enhance, not diminish, public trust in science. The complexity we confront in modeling causation is not a weakness—it’s a necessary response to the multifactorial, dynamic nature of biological systems. Scientific claims like “X causes Y” are appealing in their simplicity, but the real work lies in rigorously navigating a web of interdependent variables, mediators, confounders, and evolving biological processes.

Pearl, J., & Mackenzie, D. (2018). The book of why: The new science of cause and effect. Basic Books.

Expand full comment
Dillon's avatar

love this

Expand full comment
barnabus's avatar

All very valid points. My only query is to the regression equation on the introductory picture

Y = beta0 + beta1 + beta2*x2 + ... + epsilon

Obviously having beta0 and beta1 standing for beta0 would produce a not-well behaving multilinear regression matrix. Was it on purpose? Or was it just forgetfulness of writing out the complete beta1*x1 term?

Expand full comment