A quantile-quantile (QQ) plot is an informal but useful method for goodness of fit testing. Suppose we have an i.i.d. sample X_1, X_2, \ldots , X_ n with unknown cdf F^*. We are interested in determining whether or not F^* is the cdf F of some known distribution. For example, we may set F = \Phi _{0,1} to determine whether or not the sample comes from a standard Gaussian \, \mathcal{N}(0,1).

The quantile-quantile (QQ) plot is constructed in the following way from a data set:

Reorder the samples to be in increasing order. Denote the reordered sample by X_{(1)}, X_{(2)}, \ldots , X_{(n)}.

Plot the points

\bigg(F^{-1}\left(\frac{1}{n}\right), X_{(1)}\bigg), \, \, \bigg(F^{-1}\left(\frac{2}{n}\right), X_{(2)}\bigg), \, \, \ldots , \, \, \bigg(F^{-1}\left(\frac{i}{n}\right), X_{(i)}\bigg), \, \, \ldots , \, \, \bigg(F^{-1}\left(\frac{n-1}{n}\right), X_{(n-1)}\bigg).

Note that above we omit plotting the n'th point because F^{-1}(n/n) = F^{-1}(1) = \infty. (In cases where F^{-1}(1) is defined, we do not need to omit that point.)

Which of the following are true about quantile-quantile (QQ) plots? (Choose all that apply.)

A QQ plot provides a visual method for determining whether or not a data set has a certain distribution.

A QQ plot is a rigorous, formal method of goodness of fit testing. For example, it makes sense to talk about the type 1 error of a goodness of fit test.

If n is very large and the points on the QQ plot lie very far from the line y = x, then it is reasonable to conclude that F and F^* are close as cdfs.

Looking at the graphs of the empirical cdf F_ n(t) and the cdf F, it can be difficult to tell if the two functions are close. The QQ plot transforms the cdf F and the empirical cdf F_ n(t) so that it is easier to compare the two visually.

The following statements are true about quantile-quantile (QQ) plots:

1. A QQ plot provides a visual method for determining whether or not a data set has a certain distribution.

This statement is true. A QQ plot allows us to visually compare the quantiles of the observed data with the quantiles of a specific theoretical distribution. If the data points roughly lie on or close to a straight line, it suggests that the data follows the assumed distribution.

2. A QQ plot is a rigorous, formal method of goodness of fit testing. For example, it makes sense to talk about the type 1 error of a goodness of fit test.

This statement is false. While a QQ plot is a useful tool for assessing goodness of fit, it is not a formal hypothesis test. It does not provide a quantitative measure of goodness of fit or a p-value indicating the significance of the fit. It is more of an exploratory tool rather than a formal hypothesis testing procedure.

3. If n is very large and the points on the QQ plot lie very far from the line y = x, then it is reasonable to conclude that F and F^* are close as cdfs.

This statement is true. In a QQ plot, if the points deviate significantly from the line y = x, it suggests that the empirical distribution function (F_n) and theoretical distribution function (F) are not closely aligned. However, it is important to note that the conclusion should be made cautiously considering statistical significance and other factors.

4. Looking at the graphs of the empirical cdf F_n(t) and the cdf F, it can be difficult to tell if the two functions are close. The QQ plot transforms the cdf F and the empirical cdf F_n(t) so that it is easier to compare the two visually.

This statement is true. The QQ plot helps in transforming the comparison between the empirical quantiles and the theoretical quantiles into a visual comparison. By plotting the ordered data against the quantiles of the theoretical distribution, we can have a more intuitive understanding of the fit between the two distributions.

The correct statements about quantile-quantile (QQ) plots are:

1. A QQ plot provides a visual method for determining whether or not a data set has a certain distribution.
- This is true. By comparing the distribution of the data set to the theoretical distribution on a QQ plot, one can visually assess the goodness of fit.

2. If n is very large and the points on the QQ plot lie very far from the line y = x, then it is reasonable to conclude that F and F^* are close as cdfs.
- This is true. When the points on the QQ plot deviate significantly from the line y = x, it indicates that the empirical distribution function (F^*) and the theoretical distribution function (F) may not be closely aligned.

3. Looking at the graphs of the empirical cdf F_n(t) and the cdf F, it can be difficult to tell if the two functions are close. The QQ plot transforms the cdf F and the empirical cdf F_n(t) so that it is easier to compare the two visually.
- This is true. QQ plots provide a convenient way to visually compare the empirical and theoretical distributions by transforming them into quantiles.

Therefore, the correct options are:

- A QQ plot provides a visual method for determining whether or not a data set has a certain distribution.
- If n is very large and the points on the QQ plot lie very far from the line y = x, then it is reasonable to conclude that F and F^* are close as cdfs.
- Looking at the graphs of the empirical cdf F_n(t) and the cdf F, it can be difficult to tell if the two functions are close. The QQ plot transforms the cdf F and the empirical cdf F_n(t) so that it is easier to compare the two visually.