Let \mathbf{p}^0 be the discrete pmf that we wish to test the goodness of fit for an observed sequence of iid samples. Let \widehat{\mathbf{p}} be the MLE upon observing the iid samples.

What is \sqrt{n}\left(\widehat{\mathbf{p}} - \mathbf{p}^0\right)^ T \mathbf{1}?

Note: This is a vector dot product where \left(\widehat{\mathbf{p}} - \mathbf{p}^0\right)^ T is a row vector and \mathbf{1} is the all-ones column vector of appropriate size.

The expression \sqrt{n}\left(\widehat{\mathbf{p}} - \mathbf{p}^0\right)^ T \mathbf{1} can be interpreted as the dot product between the row vector \left(\widehat{\mathbf{p}} - \mathbf{p}^0\right)^ T and the column vector \sqrt{n}\mathbf{1}.

In this context, \mathbf{p}^0 represents the true probability mass function (pmf) that we are testing the goodness of fit for. \widehat{\mathbf{p}} represents the maximum likelihood estimate (MLE) of the pmf based on the observed iid samples.

We subtract \mathbf{p}^0 from \widehat{\mathbf{p}} to measure the difference between the estimated pmf and the true pmf. Taking the transpose of this difference vector, \left(\widehat{\mathbf{p}} - \mathbf{p}^0\right)^ T, allows us to treat it as a row vector.

The column vector \mathbf{1} is an all-ones vector of appropriate size. Multiplying \mathbf{1} by \sqrt{n} scales the vector by the square root of the sample size, n.

Finally, taking the dot product between \left(\widehat{\mathbf{p}} - \mathbf{p}^0\right)^ T and \sqrt{n}\mathbf{1} measures the overall difference between the estimated pmf and the true pmf, scaled by the sample size. This can provide useful information in assessing the goodness of fit.

0 is correct answer

I apologize for misunderstanding. The expression \sqrt{n}\left(\widehat{\mathbf{p}} - \mathbf{p}^0\right)^ T \mathbf{1} simplifies to 0.

To see why, note that \mathbf{1} is an all-ones column vector of appropriate size. When we multiply \mathbf{1} by \sqrt{n}, we simply scale each element of \mathbf{1} by \sqrt{n}. Therefore, the resulting column vector is also an all-ones column vector, but scaled by \sqrt{n}.

Taking the dot product between \left(\widehat{\mathbf{p}} - \mathbf{p}^0\right)^ T (which is a row vector) and the scaled all-ones column vector results in the sum of the elements in \left(\widehat{\mathbf{p}} - \mathbf{p}^0\right)^ T. Since the difference vector \left(\widehat{\mathbf{p}} - \mathbf{p}^0\right)^ T represents the deviation of the estimated pmf from the true pmf, the sum of these deviations will be 0 if the estimated pmf is a good fit for the true pmf.

Therefore, the expression \sqrt{n}\left(\widehat{\mathbf{p}} - \mathbf{p}^0\right)^ T \mathbf{1} simplifies to 0 if the estimated pmf is a good fit for the true pmf.

To compute \sqrt{n}\left(\widehat{\mathbf{p}} - \mathbf{p}^0\right)^ T \mathbf{1}, we can break it down into smaller steps:

Step 1: Compute the vector \left(\widehat{\mathbf{p}} - \mathbf{p}^0\right)
This can be computed by subtracting the corresponding elements of \widehat{\mathbf{p}} and \mathbf{p}^0. Since \widehat{\mathbf{p}} and \mathbf{p}^0 are both probability mass functions, this vector will contain the differences between the estimated and observed probabilities for each possible outcome.

Step 2: Transpose the vector obtained from Step 1
Taking the transpose of a row vector simply converts it into a column vector.

Step 3: Compute the dot product with the column vector \mathbf{1}
To compute the dot product, we multiply the corresponding elements of the transposed vector obtained from Step 2 and the column vector \mathbf{1}, and then sum the products.

Step 4: Multiply the result by \sqrt{n}
Finally, we multiply the result obtained from Step 3 by \sqrt{n}.

Therefore, \sqrt{n}\left(\widehat{\mathbf{p}} - \mathbf{p}^0\right)^ T \mathbf{1} is obtained by performing the above steps in order.