Consider the statistical set-up from the previous problem. In particular, recall that \mathbf{u}= \frac{1}{\sqrt{5}} (1,2)^ T and

\displaystyle \mathrm{{\boldsymbol X}}_1 \, =\, \begin{pmatrix} 1\\ 2\end{pmatrix},\, \mathrm{{\boldsymbol X}}_2 \, = \, \begin{pmatrix} 3\\ 4\end{pmatrix},\, \mathrm{{\boldsymbol X}}_3 \, =\, \begin{pmatrix} -1 \\ 0\end{pmatrix}.
Observe that for i = 1,2,3, the number \mathbf{u}\cdot \mathrm{{\boldsymbol X}}_ i (where \mathbf{u} is a unit vector) gives the signed distance from the origin to the endpoint of the projection \text {proj}_{\mathbf{u}} \mathrm{{\boldsymbol X}}_ i. By signed distance , we mean that \left| \mathbf{u}\cdot \mathrm{{\boldsymbol X}}_ i \right| is the length of \text {proj}_{\mathbf{u}} \mathrm{{\boldsymbol X}}_ i and

\displaystyle \mathbf{u}\cdot \mathrm{{\boldsymbol X}}_ i > 0 \displaystyle \Longrightarrow \mathrm{{\boldsymbol X}}_ i \, \, \text {points approximately in the direction of } \, \mathbf{u}
\displaystyle \mathbf{u}\cdot \mathrm{{\boldsymbol X}}_ i < 0 \displaystyle \Longrightarrow \mathrm{{\boldsymbol X}}_ i \, \, \text {points approximately in the opposite direction of } \, \mathbf{u}\,
Compute the empirical variance of the data set

\mathbf{u}\cdot \mathrm{{\boldsymbol X}}_1,\, \mathbf{u}\cdot \mathrm{{\boldsymbol X}}_2,\, \mathbf{u}\cdot \mathrm{{\boldsymbol X}}_3.

unanswered
Let \mathbb {X} denote the matrix whose i-th row is \mathrm{{\boldsymbol X}}_ i^ T.

Recall that S = \frac{1}{3} \mathbb {X}^ T (I_3 - \frac{1}{3} \mathbf{1} \mathbf{1}^ T) \mathbb {X} denotes the empirical covariance matrix of our data set.

What is \mathbf{u}^ T S \mathbf{u}?
(You are encouraged to use computational software.)

unanswered
Are your answers from part 1 and part 2 of this question the same?

To compute the empirical variance of the data set, we first need to compute the empirical covariance matrix S.

First, let's compute the matrix X:

X = [1 2; 3 4; -1 0]

Next, let's compute the matrix 1:

1 = [1 1 1]

Now, let's compute the matrix A:

A = I3 - (1/3) * 11'

A = [2/3 -1/3 -1/3; -1/3 2/3 -1/3; -1/3 -1/3 2/3]

Finally, let's compute the empirical covariance matrix S:

S = (1/3) * X'A*X

S = [14/3 -4/3; -4/3 4/3]

Now, let's compute u^T S u:

u = (1/sqrt(5)) * [1 2]

u^T S u = (1/sqrt(5)) * [1 2] * [14/3 -4/3; -4/3 4/3] * [1 2]

u^T S u = (1/sqrt(5)) * [6/3 6/3] * [1 2]

u^T S u = (1/sqrt(5)) * [6/3 + 12/3]

u^T S u = (1/sqrt(5)) * 18/3

u^T S u = (1/sqrt(5)) * 6

Therefore, the value of u^T S u is 6/sqrt(5).

The answer from part 1 is the empirical variance of the data set, and the answer from part 2 (u^T S u) is a measure of how much the data set varies along the direction of u. These two values are not the same.

To compute the empirical variance of the dataset, we first need to calculate the matrix X:

X = [[1, 2],
[3, 4],
[-1, 0]]

Next, we calculate S, the empirical covariance matrix:

S = (1/3) * X^T * (I3 - (1/3) * 11^T) * X

where I3 is the 2x2 identity matrix and 11^T is a matrix of ones.

Finally, we calculate u^T S u, where u is the vector (1/sqrt(5)) * [1, 2]:

u^T S u = (1/sqrt(5)) * [[1, 2]] * S * [[1],
[2]]

To compute the empirical variance of the dataset $\mathbf{u}\cdot\mathrm{\boldsymbol X}_1, \mathbf{u}\cdot\mathrm{\boldsymbol X}_2, \mathbf{u}\cdot\mathrm{\boldsymbol X}_3$, we need to calculate the variance of these three values.

1. Compute $\mathbf{u}\cdot\mathrm{\boldsymbol X}_1$:
- $\mathbf{u} = \frac{1}{\sqrt{5}} \begin{pmatrix} 1 \\ 2 \end{pmatrix}$
- $\mathrm{\boldsymbol X}_1 = \begin{pmatrix} 1 \\ 2 \end{pmatrix}$
- $\mathbf{u}\cdot\mathrm{\boldsymbol X}_1 = \frac{1}{\sqrt{5}} \begin{pmatrix} 1 \\ 2 \end{pmatrix} \cdot \begin{pmatrix} 1 \\ 2 \end{pmatrix} = \frac{1}{\sqrt{5}}(1\cdot 1 + 2\cdot 2) = \frac{5}{\sqrt{5}} = \sqrt{5}$

2. Compute $\mathbf{u}\cdot\mathrm{\boldsymbol X}_2$:
- $\mathbf{u} = \frac{1}{\sqrt{5}} \begin{pmatrix} 1 \\ 2 \end{pmatrix}$
- $\mathrm{\boldsymbol X}_2 = \begin{pmatrix} 3 \\ 4 \end{pmatrix}$
- $\mathbf{u}\cdot\mathrm{\boldsymbol X}_2 = \frac{1}{\sqrt{5}} \begin{pmatrix} 1 \\ 2 \end{pmatrix} \cdot \begin{pmatrix} 3 \\ 4 \end{pmatrix} = \frac{1}{\sqrt{5}}(1\cdot 3 + 2\cdot 4) = \frac{11}{\sqrt{5}}$

3. Compute $\mathbf{u}\cdot\mathrm{\boldsymbol X}_3$:
- $\mathbf{u} = \frac{1}{\sqrt{5}} \begin{pmatrix} 1 \\ 2 \end{pmatrix}$
- $\mathrm{\boldsymbol X}_3 = \begin{pmatrix} -1 \\ 0 \end{pmatrix}$
- $\mathbf{u}\cdot\mathrm{\boldsymbol X}_3 = \frac{1}{\sqrt{5}} \begin{pmatrix} 1 \\ 2 \end{pmatrix} \cdot \begin{pmatrix} -1 \\ 0 \end{pmatrix} = \frac{1}{\sqrt{5}}(1\cdot -1 + 2\cdot 0) = -\frac{1}{\sqrt{5}}$

Now, compute the empirical variance using the formula:

$$\text{Empirical Variance} = \frac{1}{n-1} \sum_{i=1}^{n} (\mathbf{u}\cdot\mathrm{\boldsymbol X}_i - \bar{\mathbf{u}})^2$$

where $\bar{\mathbf{u}}$ is the mean of the values $\mathbf{u}\cdot\mathrm{\boldsymbol X}_1, \mathbf{u}\cdot\mathrm{\boldsymbol X}_2, \mathbf{u}\cdot\mathrm{\boldsymbol X}_3$.

In this case, since we have only three values, the formula becomes:

$$\text{Empirical Variance} = (\mathbf{u}\cdot\mathrm{\boldsymbol X}_1 - \bar{\mathbf{u}})^2 + (\mathbf{u}\cdot\mathrm{\boldsymbol X}_2 - \bar{\mathbf{u}})^2 + (\mathbf{u}\cdot\mathrm{\boldsymbol X}_3 - \bar{\mathbf{u}})^2$$

where $\bar{\mathbf{u}}$ is the mean of $\mathbf{u}\cdot\mathrm{\boldsymbol X}_1, \mathbf{u}\cdot\mathrm{\boldsymbol X}_2, \mathbf{u}\cdot\mathrm{\boldsymbol X}_3$.

Now, let's compute the empirical variance:

- Compute the mean, $\bar{\mathbf{u}}$:
- $\bar{\mathbf{u}} = \frac{\mathbf{u}\cdot\mathrm{\boldsymbol X}_1 + \mathbf{u}\cdot\mathrm{\boldsymbol X}_2 + \mathbf{u}\cdot\mathrm{\boldsymbol X}_3}{3}$
- $\bar{\mathbf{u}} = \frac{\sqrt{5} + \frac{11}{\sqrt{5}} - \frac{1}{\sqrt{5}}}{3}$
- $\bar{\mathbf{u}} = \frac{5 + 11 - 1}{3\sqrt{5}} = \frac{15}{3\sqrt{5}} = \frac{5}{\sqrt{5}} = \sqrt{5}$

- Compute the empirical variance:
- Empirical Variance $= (\mathbf{u}\cdot\mathrm{\boldsymbol X}_1 - \bar{\mathbf{u}})^2 + (\mathbf{u}\cdot\mathrm{\boldsymbol X}_2 - \bar{\mathbf{u}})^2 + (\mathbf{u}\cdot\mathrm{\boldsymbol X}_3 - \bar{\mathbf{u}})^2$
- Empirical Variance $= (\sqrt{5} - \sqrt{5})^2 + (\frac{11}{\sqrt{5}} - \sqrt{5})^2 + (-\frac{1}{\sqrt{5}} - \sqrt{5})^2$

Now, compute the square of each term and sum them up to get the answer.

Regarding part 2 of the question, we are asked to compute $\mathbf{u}^T S \mathbf{u}$, where $S$ is the empirical covariance matrix. However, the matrix $S$ has not been defined in the question, so we cannot compute this value without further information.