Recall the statistical set-up above. Recall that X_1, \ldots , X_ n \stackrel{iid}{\sim } \mathbf{P} are iid from an unknown distribution \mathbf{P}. For all 1 \leq i \leq n, Y_ i is a discrete random variable supported on \{ 1, \ldots , 5\} that denotes which bin contains the realization of X_ i.

Let \mathbf{P}_{\mu , \sigma ^2} = \mathcal{N}(\mu , \sigma ^2) and let (\widehat{\mu }, \widehat{\sigma }^2) denote the MLE for the statistical model (\mathbb {R}, \{ P_{\mu , \sigma ^2} \} _{\mu \in \mathbb {R}, \sigma ^2\in (0,\infty )}), i.e. Gaussian with unknown mean and unknown variance. For 1 \leq j \leq 5, let N_ j denote the frequency of j (i.e. number of times that j appears) in the data set Y_1, \ldots , Y_ n.

Define the \chi ^2 test statistic

T_ n = n \sum _{j = 1}^5 \frac{\left(\frac{N_ j}{n} - P_{\widehat{\mu }, \widehat{\sigma }^2}(Z \in A_ j)\right)^2}{P_{\widehat{\mu }, \widehat{\sigma }^2}(Z \in A_ j)}.

where Z \sim \mathcal{N}( \widehat{\mu }, \widehat{\sigma }^2 ). Then it holds that

T_ n \xrightarrow [n \to \infty ]{(d)} \chi ^2_\ell

for some constant \ell > 0.

What is \ell?

Hint: Use the result on the very last page of Lecture 15.

l=\quad

l=4

In order to find the value of \(\ell\), we need to refer to the result stated on the very last page of Lecture 15.

According to the result, if the random variables \(X_1, \ldots, X_n\) are independent and identically distributed (iid) from the unknown distribution \(\mathbf{P}\), and if \(\hat{\theta}\) is the maximum likelihood estimator (MLE) for a parameter \(\theta\) of the statistical model, then the test statistic

\[T_n = n\sum_{i=1}^n \frac{(X_i - \hat{\theta})^2}{\text{Var}(\hat{\theta})}\]

converges in distribution to the chi-square distribution with \(\ell\) degrees of freedom, where \(\text{Var}(\hat{\theta})\) denotes the asymptotic variance of \(\hat{\theta}\).

In the given setup, the MLE for the statistical model \((\mathbb{R}, \{P_{\mu, \sigma^2}}\}_{\mu \in \mathbb{R}, \sigma^2 \in (0,\infty)})\) is denoted as \((\widehat{\mu}, \widehat{\sigma}^2)\). We are interested in finding the value of \(\ell\) for the test statistic

\[T_n = n\sum_{j=1}^5 \frac{\left(\frac{N_j}{n} - P_{\widehat{\mu}, \widehat{\sigma}^2}(Z \in A_j)\right)^2}{P_{\widehat{\mu}, \widehat{\sigma}^2}(Z \in A_j)}\]

where \(Z \sim \mathcal{N}(\widehat{\mu}, \widehat{\sigma}^2)\), and \(N_j\) denotes the frequency of \(j\) in the dataset \(Y_1, \ldots, Y_n\).

By comparing the given test statistic to the general form mentioned above, we can see that

\[\hat{\theta} = \frac{N_j}{n}\]
\[\text{Var}(\hat{\theta}) = P_{\widehat{\mu}, \widehat{\sigma}^2}(Z \in A_j)\]

Hence, we have

\[\ell = \text{Var}(\hat{\theta}) = P_{\widehat{\mu}, \widehat{\sigma}^2}(Z \in A_j)\]

Therefore, the value of \(\ell\) is given by the probability \(P_{\widehat{\mu}, \widehat{\sigma}^2}(Z \in A_j)\).