Let us think about what goes wrong when we drop the assumption that \textsf{Var}(X) \neq 0 in theoretical linear regression.

Let X and Y be two real random variables with two moments, and \textsf{Var}(X) = 0. (Note: the variance of X is zero whenever \mathbf{P}(X = \mathbb E[X]) = 1.) We make no further assumptions on Y.

Which one of the following statements is false?

There is an infinite family of solutions (a,b) that minimize the squared mean error, \mathbb E[(Y - a - bX)^2].

There is no line y = a+bx that predicts Y given X with probability 1, regardless of their distribution.

With probability equal to 1, the random pair (X,Y) lies on the vertical line x = \mathbb E[X].
unanswered

The false statement is:

There is an infinite family of solutions (a,b) that minimize the squared mean error, E[(Y - a - bX)^2].

When Var(X) = 0, it means that X is a constant value. In this case, Y is completely determined by this constant value of X, and there is no need for any further regression analysis. The relationship between X and Y is deterministic, and there is only one solution that minimizes the squared mean error.

The false statement is:

"There is an infinite family of solutions (a,b) that minimize the squared mean error, E[(Y - a - bX)^2]."

When the assumption that Var(X) = 0 is dropped, it means that X has no variability and is essentially constant. In this case, the relationship between X and Y becomes deterministic, and there is a single value of Y for each value of X. Therefore, there is only one solution (a,b) that minimizes the squared mean error, and it is not an infinite family of solutions.

The remaining statements are all true:

- There is no line y = a+bx that predicts Y given X with probability 1, regardless of their distribution. This is because when X has no variability, it cannot fully explain the variability in Y.

- With probability equal to 1, the random pair (X,Y) lies on the vertical line x = E[X]. Since X is constant, the random pair (X,Y) will always lie on this vertical line.