5 Comments
User's avatar
tailcalled's avatar

One notion of variance explained I like:

Suppose X is a cause of Y that explains r^2 of the variance in Y. If you have someone who is z standard deviations about average in Y, then in expectation r^2 z of that is due to being above average in X. That is, the variance explained tells you how much of the Y is explained by the X.

The trouble is that people are mixing up two different questions: explanation and intervention. If you want to intervene on Y with some variable X, then it doesn't matter how well X explains the preexisting variance in Y. It just matters how big the effect of X is. But conversely, if someone wants to explain Y, then it doesn't matter how big the interventions are that it supports, it only matters how much variance it explains. It just so happens that there is a simple quadratic relationship between the effect size and the explanation validity (in the linear-Gaussian case).

Samuel Hapak's avatar

This is so wrong.

You are clearly thinking that if Y = ∑x_i, then corr(Y, x_i) = 0.1. And you are wrong, because if xs are iid, then each of them would correlate ~ 32%!

That's because corr(A, B) = cov(A, B) / sqrt(var(A)var(B)), and var(Y) = ∑var(x_i)!

If Y = ∑x_i + ε, and we assume all the x are iid, and let's further assume that var(x_i) = 1.

var(Y) = 10 × var(x_i) + var(ε) = 10 + var(ε)

cov(Y, x_i) = 1

And given that:

0.1 = corr(Y, x_i) = cov(Y, x_i) / sqrt(var(x_i)×var(Y) = 1 / sqrt(10 + var(ε))

thus:

10 + var(ε) = 100

So, yes, all the Xs combined explain only 10% of all the variation.

Cremieux's avatar

Are you daft? Are you really trying to debate a clearly valid hypothetical that you can simulate in two seconds to see that it's feasible?

>You are clearly thinking that if Y = ∑x_i, then corr(Y, x_i) = 0.1.

If something that isn't true is clear to you, then you have serious problems.

> And you are wrong, because if xs are iid, then each of them would correlate ~ 32%!

First, correlations are not percentages.

Second, you're confused and talking about the multiple correlation R, whereas I'm talking about the role of the individual r's for the level of the mean. \sqrt{k*r^2} is about 0.316 in this hypothetical, so the composite of the predictors correlates at about 0.316 with Y, sure. But this is irrelevant to my point! If you can't tell I'm talking about the predicted mean value, then I suppose that just proves you're daft, since that is obviously the focus of the article; it's damn near explicit.

One perfect predictor: \hat{Y} = 1 * 1 = 1

Ten orthogonal r = 0.10 predictors: \hat{Y} = 10 * 0.10 * 1 = 1

Same point-estimate, different precision. To the person doing selection, that is the sole difference.

User's avatar
Comment deleted
Mar 20, 2023
Comment deleted