Even though logistic regression is formulated with continuous input data in mind, one can also try to apply it to categorical inputs. For example, consider the following set-up: We observe \, n \, samples \, Y_ i \in \{ 0, 1\} \,, \, i = 1, \dots , n \,, and covariates \, X_ i \in \{ 0, 1\} \,, \, i = 1, \dots , n \,. Moreover, assume that given \, X_ i \,, the \, Y_ i \, are independent.
First, let us apply regular maximum likelihood estimation. To this end, write
\displaystyle f_{00} = {} \displaystyle \frac{1}{n} \# \{ i : X_ i = 0 \text { and } Y_ i = 0 \}
\displaystyle f_{01} = {} \displaystyle \frac{1}{n} \# \{ i : X_ i = 0 \text { and } Y_ i = 1 \}
\displaystyle f_{10} = {} \displaystyle \frac{1}{n} \# \{ i : X_ i = 1 \text { and } Y_ i = 0 \}
\displaystyle f_{11} = {} \displaystyle \frac{1}{n} \# \{ i : X_ i = 1 \text { and } Y_ i = 1 \}
and assume that \, f_{00}, f_{01}, f_{10}, f_{11} > 0 \,. We can parametrize this model in terms of
\displaystyle p_{01} = {} \displaystyle P(Y_ i = 1 | X_ i = 0)
\displaystyle p_{11} = {} \displaystyle P(Y_ i = 1 | X_ i = 1)
Compute the maximum likelihood estimators \, \widehat{p}_{01} \, and \, \widehat{p}_{11} \, for \, p_{01} \, and \, p_{11} \,, respectively. Express your answer in terms of f_{00} (enter “A"), f_{01} (enter “B"), f_{10} (enter “C"), f_{11} (enter “D") and n.
\widehat{p}_{01}
B/(A+B)
correct
\widehat{p}_{11}
D/(C+D)
correct
SaveSave your answer
Submit
You have used 2 of 3 attemptsSome problems have options such as save, reset, hints, or show answer. These options follow the Submit button.
(b)
0/2 points (graded)
Although the [mathjaxinline]\, X_ i \,[/mathjaxinline] are discrete, we can also use a logistic regression model to analyze the data. That is, now we assume
[mathjax]Y_ i | X_ i \sim \textsf{Ber}\left( \frac{1}{1 + \mathbf e^{-(X_ i \beta _1 + \beta _0})} \right),[/mathjax]
for [mathjaxinline]\, \beta _0, \beta _1 \in \mathbb {R} \,[/mathjaxinline], and that given [mathjaxinline]\, X_ i \,[/mathjaxinline], the [mathjaxinline]\, Y_ i \,[/mathjaxinline] are independent.
Calculate the maximum likelihood estimator [mathjaxinline]\, \widehat{\beta }_0 \,[/mathjaxinline], [mathjaxinline]\, \widehat{\beta }_1 \,[/mathjaxinline] for [mathjaxinline]\, \beta _0 \,[/mathjaxinline] and [mathjaxinline]\, \beta _1 \,[/mathjaxinline], where we again assume that all [mathjaxinline]\, f_{kl} > 0 \,[/mathjaxinline]. Express your answer in terms of [mathjaxinline]f_{00}[/mathjaxinline] (enter “A"), [mathjaxinline]f_{01}[/mathjaxinline] (enter “B"), [mathjaxinline]f_{10}[/mathjaxinline] (enter “C"), [mathjaxinline]f_{11}[/mathjaxinline] (enter “D") and [mathjaxinline]n[/mathjaxinline].
[mathjaxinline]\widehat{\beta }_{0}[/mathjaxinline]
(A+D)/(A+B+C+D)
incorrect
\(\)
[mathjaxinline]\widehat{\beta }_{1}[/mathjaxinline]
(B+D)/(A+B+C+D)
To calculate the maximum likelihood estimators for β₀ and β₁, we need to maximize the log-likelihood function. The log-likelihood function is given by:
L(β₀, β₁) = ∑[i=1 to n] (Yᵢ log(p) + (1 - Yᵢ) log(1 - p))
Where p = 1 / (1 + exp(-(Xᵢβ₁ + β₀)))
Taking the derivative of L(β₀, β₁) with respect to β₀ and β₁, and setting them to zero, we can solve for the maximum likelihood estimators.
Let's calculate the maximum likelihood estimators step-by-step:
1. Taking the derivative of L(β₀, β₁) with respect to β₀:
∂L/∂β₀ = ∑[i=1 to n] (Yᵢ - p)
Setting ∂L/∂β₀ to zero:
0 = ∑[i=1 to n] (Yᵢ - p)
Expanding p:
0 = ∑[i=1 to n] (Yᵢ - 1 / (1 + exp(-(Xᵢβ₁ + β₀))))
Rearranging the equation:
0 = ∑[i=1 to n] (Yᵢ - 1) - ∑[i=1 to n] 1 / (1 + exp(-(Xᵢβ₁ + β₀)))
Simplifying further:
0 = ∑[i=1 to n] (Yᵢ - 1) - n / (1 + exp(-β₀))
Rearranging again:
n / (1 + exp(-β₀)) = ∑[i=1 to n] (Yᵢ - 1)
Dividing both sides by n:
1 / (1 + exp(-β₀)) = (∑[i=1 to n] (Yᵢ - 1)) / n
Simplifying further:
exp(β₀) = 1 / ((∑[i=1 to n] (Yᵢ - 1)) / n - 1)
Taking the natural logarithm of both sides:
β₀ = ln(1 / ((∑[i=1 to n] (Yᵢ - 1)) / n - 1))
2. Taking the derivative of L(β₀, β₁) with respect to β₁:
∂L/∂β₁ = ∑[i=1 to n] (Xᵢ(Yᵢ - p))
Setting ∂L/∂β₁ to zero:
0 = ∑[i=1 to n] (Xᵢ(Yᵢ - p))
Expanding p:
0 = ∑[i=1 to n] (Xᵢ(Yᵢ - 1 / (1 + exp(-(Xᵢβ₁ + β₀)))))
Rearranging the equation:
0 = ∑[i=1 to n] (XᵢYᵢ - Xᵢ / (1 + exp(-(Xᵢβ₁ + β₀))))
Simplifying further:
0 = ∑[i=1 to n] (XᵢYᵢ) - ∑[i=1 to n] (Xᵢ / (1 + exp(-(Xᵢβ₁ + β₀))))
Rearranging again:
∑[i=1 to n] (Xᵢ / (1 + exp(-(Xᵢβ₁ + β₀)))) = ∑[i=1 to n] (XᵢYᵢ)
Dividing both sides by n:
(1 / n) ∑[i=1 to n] (Xᵢ / (1 + exp(-(Xᵢβ₁ + β₀)))) = (1 / n) ∑[i=1 to n] (XᵢYᵢ)
Simplifying further:
(1 / n) ∑[i=1 to n] (Xᵢ / (1 + exp(-(Xᵢβ₁ + β₀)))) = f₁₁
where f₁₁ = (1 / n) ∑[i=1 to n] (XᵢYᵢ)
Therefore, the maximum likelihood estimator for β₁ is:
β₁ = (1 / n) ∑[i=1 to n] (Xᵢ / (1 + exp(-(Xᵢβ₁ + β₀)))) / f₁₁
Unfortunately, the maximum likelihood estimator for β₀ cannot be expressed solely in terms of f_{kl}, n, and the given parameters.