Discussion about this post

User's avatar
tailcalled's avatar

One notion of variance explained I like:

Suppose X is a cause of Y that explains r^2 of the variance in Y. If you have someone who is z standard deviations about average in Y, then in expectation r^2 z of that is due to being above average in X. That is, the variance explained tells you how much of the Y is explained by the X.

The trouble is that people are mixing up two different questions: explanation and intervention. If you want to intervene on Y with some variable X, then it doesn't matter how well X explains the preexisting variance in Y. It just matters how big the effect of X is. But conversely, if someone wants to explain Y, then it doesn't matter how big the interventions are that it supports, it only matters how much variance it explains. It just so happens that there is a simple quadratic relationship between the effect size and the explanation validity (in the linear-Gaussian case).

Samuel Hapak's avatar

This is so wrong.

You are clearly thinking that if Y = ∑x_i, then corr(Y, x_i) = 0.1. And you are wrong, because if xs are iid, then each of them would correlate ~ 32%!

That's because corr(A, B) = cov(A, B) / sqrt(var(A)var(B)), and var(Y) = ∑var(x_i)!

If Y = ∑x_i + ε, and we assume all the x are iid, and let's further assume that var(x_i) = 1.

var(Y) = 10 × var(x_i) + var(ε) = 10 + var(ε)

cov(Y, x_i) = 1

And given that:

0.1 = corr(Y, x_i) = cov(Y, x_i) / sqrt(var(x_i)×var(Y) = 1 / sqrt(10 + var(ε))

thus:

10 + var(ε) = 100

So, yes, all the Xs combined explain only 10% of all the variation.

3 more comments...

No posts

Ready for more?