In this problem, we will explore how to update the belief successively, having observed data. The model is as follows:

\theta \in \Theta, the parameter space; and \pi (\cdot ) is the prior distribution of \theta.

We observe i.i.d. (conditional on the parameter) data X_1, \ldots, X_ n and calculate the likelihood function L_ n(X_1, \ldots , X_ n|\theta ) (as in the setting of maximum likelihood estimation)

Write \phi (X_1, \ldots , X_ n) as a placeholder function that depends on X_1, \ldots , X_ n, but not on the parameter \theta. (\phi could stand for different functions in different equations. It's simply a placeholder whenever we want to collect terms that only depend on X_1, \ldots , X_ n.)

In this context, we add observations one by one, computing the likelihood L_ i(X_1, \ldots , X_ i|\theta ) and posterior \pi (\theta |X_1, \ldots , X_ i) after each observation i. Which of the following identities are true? (Choose all that apply.)

\pi (\theta | X_1, \ldots , X_ n) = \pi (\theta ) \cdot L_ n(X_1, \ldots , X_ n | \theta ) \cdot \phi (X_1, \ldots , X_ n)

L_ n(X_1, \ldots , X_ n | \theta ) = L_{n-1}(X_1, \ldots , X_{n-1}| \theta ) \cdot L_1(X_ n | \theta )

\pi (\theta | X_1, \ldots , X_ n) = \pi (\theta | X_1, \ldots , X_{n-1}) \cdot L_1(X_ n | \theta ) \cdot \phi (X_1, \ldots , X_ n)

L_ n(X_1, \ldots , X_ n | \theta ) = \frac{\pi (\theta | X_1, \ldots , X_ n)}{\pi (\theta )}\phi (X_1, \ldots , X_ n)

The correct identities are:

1. \pi (\theta | X_1, \ldots , X_ n) = \pi (\theta ) \cdot L_ n(X_1, \ldots , X_ n | \theta ) \cdot \phi (X_1, \ldots , X_ n)

This is the definition of the posterior distribution \pi (\theta | X_1, \ldots , X_ n). It is obtained by multiplying the prior distribution \pi (\theta ) with the likelihood function L_ n(X_1, \ldots , X_ n | \theta ) and the placeholder function \phi (X_1, \ldots , X_ n).

2. L_ n(X_1, \ldots , X_ n | \theta ) = L_{n-1}(X_1, \ldots , X_{n-1}| \theta ) \cdot L_1(X_ n | \theta )

This is the product rule of probabilities. The likelihood function for the n observations can be factorized as the product of the likelihood function for the first n-1 observations and the likelihood function for the nth observation.

3. \pi (\theta | X_1, \ldots , X_ n) = \pi (\theta | X_1, \ldots , X_{n-1}) \cdot L_1(X_ n | \theta ) \cdot \phi (X_1, \ldots , X_ n)

This is the recursive update formula for the posterior distribution. The posterior after n observations can be obtained by multiplying the posterior after n-1 observations with the likelihood for the nth observation and the placeholder function.

4. L_ n(X_1, \ldots , X_ n | \theta ) = \frac{\pi (\theta | X_1, \ldots , X_ n)}{\pi (\theta )}\phi (X_1, \ldots , X_ n)

This is the Bayes' theorem. The likelihood function can be expressed as the ratio of the posterior distribution to the prior distribution, multiplied by the placeholder function.

The correct identities are:

1. \pi (\theta | X_1, \ldots , X_ n) = \pi (\theta ) \cdot L_ n(X_1, \ldots , X_ n | \theta ) \cdot \phi (X_1, \ldots , X_ n)

This is the definition of the posterior distribution of \theta given the observed data X_1, \ldots , X_ n.

2. L_ n(X_1, \ldots , X_ n | \theta ) = L_{n-1}(X_1, \ldots , X_{n-1}| \theta ) \cdot L_1(X_ n | \theta )

This is the product rule of likelihoods, stating that the likelihood of the full data set X_1, \ldots , X_ n can be expressed as the product of the likelihood of the first n-1 observations and the likelihood of the nth observation, given the parameter \theta.

Thus, the correct identities are options 1 and 2.