Normally, we should be thinking of linear regression being performed on a data set \{ (x_ i,y_ i) \} _{i=1}^ n, which we think of as a deterministic collection of points in the Euclidean space. It is helpful to also consider an idealized scenario, where we assume that X and Y are random variables that follow some joint probability distribution and they have finite first and second moments. In this problem, we will derive the solution to the theoretical linear regression problem.

Assume \textsf{Var}(X) \neq 0. The theoretical linear (least squares) regression of Y on X prescribes that we find a pair of real numbers a and b that minimize \mathbb E[(Y - a - bX)^2], over all possible choices of the pair (a,b).

To do so, we will use a classical calculus technique. Let f(a,b) = \mathbb E[(Y - a - bX)^2], and now we solve for the critical points where the gradient is zero.

Hint: Here, assume you can switch expectation and differentiation with respect to a and b. That is, \partial _ a \mathbb E[(\cdots )] = \mathbb E[ \partial _ a (\cdots ) ].

Use X and Y for random variables X and Y.

The partial derivatives are:

\partial _ a f = \mathbb E\Big[
unanswered \Big]
\partial _ b f = \mathbb E\Big[