An aerospace manufacturing company would like to assess the performance of its existing planes for its latest design. Based on a sample size of n = 1000 flights, each with an identically designed plane, it collects data of the form (x_{1},y_{1}), \ldots , (x_{1000},y_{1000}), where x represents the distance traveled and y represents liters of fuel consumed.

You, as a statistician hired by the company, decide to perform linear regression on the model y = a + bx to predict the efficiency of the design. In the context of linear regression, recall that the mathematical model calls for:

\mathbf Y= \left( \begin{array}{c} y_1 \\ \vdots \\ y_{1000} \end{array} \right) \in \mathbb {R}^{1000}, \quad {\boldsymbol \varepsilon }\in \mathbb {R}^{1000}, \quad \mathbb {X} = \left( \begin{array}{cc} 1 & x_1 \\ \vdots & \vdots \\ 1 & x_{1000} \end{array} \right) \in \mathbb {R}^{1000 \times 2}, \quad {\boldsymbol \beta }= \left( \begin{array}{c} a \\ b \end{array} \right) \in \mathbb {R}^2.

Assume that {\boldsymbol \varepsilon }\sim \mathcal{N}(0, \sigma ^2 I_{1000}) for some fixed \sigma ^2, so that \mathbf Y\sim \mathcal{N}(\mathbb {X} {\boldsymbol \beta }, \sigma ^2 I_{1000}).

Using the setup as above, you compute the LSE, which comes out to

\hat{{\boldsymbol \beta }} = \left( \begin{array}{c} \hat{a} \\ \hat{b} \end{array} \right) = \left( \begin{array}{c} 0.8 \text { liters} \\ 15.0 \text { liters / km} \end{array} \right).

Just from \hat{{\boldsymbol \beta }}, what is a reasonable prediction for the total amount of fuel a plane (in liters) consumes after 200 kilometers?

Using the same setup as the previous problem:

\hat{{\boldsymbol \beta }} = \left( \begin{array}{c} \hat{a} \\ \hat{b} \end{array} \right) = \left( \begin{array}{c} 0.8 \text { liters} \\ 15.0 \text { liters / km} \end{array} \right).

Using n=1000 samples, by thinking of \mathbf Y as the vector of observations, we might also consider the Maximum Likelihood Estimator \beta _{MLE}. As a reminder, {\boldsymbol \beta }_{MLE} maximizes, over all choices of {\boldsymbol \beta }, the likelihood (or the log-likelihood) of \mathbf Y\sim \mathcal{N}(\mathbb {X} {\boldsymbol \beta }, \sigma ^2 I_{1000}).

Numerically, {\boldsymbol \beta }_{MLE} = \left( \begin{array}{c} a_{MLE} \\ b_{MLE} \end{array} \right), where:

a_{MLE} =
b_{MLE} =

To find $\beta_{MLE}$, we need to maximize the likelihood (or the log-likelihood) of $Y \sim \mathcal{N}(X\beta, \sigma^2 I_{1000})$. Since $Y$ follows a normal distribution, the log-likelihood function is given by:

$$\ell(\beta) = -\frac{1}{2} \log(|\sigma^2 I_{1000}|) - \frac{1}{2}(Y - X\beta)^T(\sigma^2 I_{1000})^{-1}(Y - X\beta)$$

To maximize $\ell(\beta)$, we can take the derivative with respect to $\beta$ and set it equal to zero. Let's calculate the derivatives:

$$\frac{\partial \ell(\beta)}{\partial \beta} = \frac{\partial}{\partial \beta}\left(-\frac{1}{2} \log(|\sigma^2 I_{1000}|) - \frac{1}{2}(Y - X\beta)^T(\sigma^2 I_{1000})^{-1}(Y - X\beta)\right)$$

The first term, $\frac{\partial}{\partial \beta}(-\frac{1}{2} \log(|\sigma^2 I_{1000}|))$, is equal to zero because it does not depend on $\beta$.

For the second term, we can calculate the derivative as follows (using the chain rule and the fact that $\frac{\partial A^{-1}}{\partial x} = -A^{-1}\frac{\partial A}{\partial x}A^{-1}$):

$$\frac{\partial}{\partial \beta}\left(- \frac{1}{2}(Y - X\beta)^T(\sigma^2 I_{1000})^{-1}(Y - X\beta)\right) = -\frac{1}{2} \frac{\partial}{\partial \beta}\left((Y - X\beta)^T(\sigma^2 I_{1000})^{-1}(Y - X\beta)\right)$$
$$= -\frac{1}{2} \frac{\partial}{\partial \beta}\left(\beta^TX^T\sigma^2 I_{1000}(Y - X\beta)\right) = -\frac{1}{2}(-2X^T\sigma^2 I_{1000})(Y - X\beta)$$
$$= \sigma^2X^T(Y - X\beta)$$

Setting this derivative equal to zero, we have:

$$\sigma^2X^T(Y - X\beta) = 0$$

Simplifying, we get:

$$X^TY - X^TX\beta = 0$$

Multiplying by $X^T$ on both sides, we have:

$$X^TX\beta = X^TY$$

Solving for $\beta$, we get:

$$\beta_{MLE} = (X^TX)^{-1}X^TY$$

Substituting the values of $X$, $Y$, and $\hat{\beta}$ given in the problem, we can calculate $\beta_{MLE}$:

$$\beta_{MLE} = \left(\begin{array}{cc} 1 & 15.0 \\ 1 & 15.0 \\ \vdots & \vdots \\ 1 & 15.0 \end{array}\right)^{-1} \left(\begin{array}{c} y_1 \\ y_2 \\ \vdots \\ y_{1000} \end{array}\right)$$

Simplifying:

$$\beta_{MLE} = \left(\begin{array}{cc} 1 & 1000 \\ 1 & 1000 \\ \vdots & \vdots \\ 1 & 1000 \end{array}\right) \left(\begin{array}{c} 0.8 \\ 15.0 \end{array}\right)$$

$$\beta_{MLE} = \left(\begin{array}{c} 800 \\ 15,000 \end{array}\right)$$

Therefore, the maximum likelihood estimate for $\beta$ is $\beta_{MLE} = \left(\begin{array}{c} 800 \\ 15,000 \end{array}\right)$.

To find the Maximum Likelihood Estimators (MLE) for a and b, we need to maximize the likelihood function of Y given the observed data. The likelihood function is given by:

L({\boldsymbol \beta}) = \frac{1}{(\sqrt{2\pi}\sigma)^n} \exp\left(-\frac{1}{2\sigma^2}(Y - \mathbb {X} {\boldsymbol \beta})^T(Y - \mathbb {X} {\boldsymbol \beta})\right)

Taking the logarithm of the likelihood function, we have:

\log L({\boldsymbol \beta}) = -\frac{n}{2}\log(2\pi) - \frac{n}{2}\log(\sigma^2) - \frac{1}{2\sigma^2}(Y - \mathbb {X} {\boldsymbol \beta})^T(Y - \mathbb {X} {\boldsymbol \beta})

Taking the derivative with respect to {\boldsymbol \beta} and setting it to zero, we can find the MLE estimators for a and b.

To find a_{MLE}, we differentiate the logarithm of the likelihood function with respect to a, and set it to zero:

\frac{\partial \log L({\boldsymbol \beta})}{\partial a} = -\frac{1}{\sigma^2}\sum_{i=1}^{n}(y_i - a - bx_i) = 0

Rearranging the equation, we get:

\sum_{i=1}^{n}(y_i - a - bx_i) = 0

Expanding and rearranging the equation, we get:

\sum_{i=1}^{n}y_i - na - b\sum_{i=1}^{n}x_i = 0

Solving for a, we get:

a_{MLE} = \frac{1}{n} \sum_{i=1}^{n}y_i - \frac{b}{n}\sum_{i=1}^{n}x_i

Similarly, to find b_{MLE}, we differentiate the logarithm of the likelihood function with respect to b, and set it to zero:

\frac{\partial \log L({\boldsymbol \beta})}{\partial b} = -\frac{1}{\sigma^2}\sum_{i=1}^{n}x_i(y_i - a - bx_i) = 0

Rearranging the equation, we get:

\sum_{i=1}^{n}x_i(y_i - a - bx_i) = 0

Expanding and rearranging the equation, we get:

\sum_{i=1}^{n}x_iy_i - a\sum_{i=1}^{n}x_i - b\sum_{i=1}^{n}x_i^2 = 0

Solving for b, we get:

b_{MLE} = \frac{\sum_{i=1}^{n}x_iy_i - a_{MLE}\sum_{i=1}^{n}x_i}{\sum_{i=1}^{n}x_i^2}

Given that \hat{{\boldsymbol \beta }} = \left( \begin{array}{c} 0.8 \text { liters} \\ 15.0 \text { liters / km} \end{array} \right), we can substitute the values of a_{MLE} and b_{MLE} into the equations above to find the MLE estimators.