We consider a 1-dimensional logistic regression problem, i.e., assume that data \, X_ i \in \mathbb {R}, i = 1, \dots , n \, is given and that get independent observations of

Y_ i | X_ i \sim \textsf{Ber}\left( \frac{\mathbf e^{\beta X_ i}}{1 + \mathbf e^{\beta X_ i}} \right),

where \, \beta \in \mathbb {R} \,.

Moreover, recall that the associated log likelihood for \, \beta \, is then given by

\ell (\beta ) = \sum _{i=1}^{n} \left( Y_ i X_ i \beta - \ln (1 + \exp (X_ i \beta )) \right)

Calculate the first and second derivate of \, \ell \,. Instructions: The summation \sum _{i=1}^{n} is already placed to the left of the answer box. Enter the summands in terms of \beta, X_ i (enter “X_i") and Y_ i (enter “Y_i").

\displaystyle \ell '(\beta ) = \sum _{i=1}^{n}
X_i*Y_i-((X_i*e^(X_i*beta))/(1+e^(X_i*beta)))
correct

\displaystyle \ell ^{\prime \prime }(\beta ) = \sum _{i=1}^{n}
-(((X_i)^2)*e^(X_i*beta))/((1+e^(X_i*beta))^2)
correct
[Math Processing Error]
What can you conclude about \, \ell '(\beta ) \,?

\, \ell ' \, is neither increasing nor decreasing on the whole of \, \mathbb {R} \,.

\, \ell ' \, is strictly decreasing.

\, \ell ' \, is strictly increasing.
correct
Submit
You have used 3 of 3 attemptsSome problems have options such as save, reset, hints, or show answer. These options follow the Submit button.
(b)
3/3 points (graded)
Imagine we are given the following data (n=2):

\displaystyle X_1 = 0 \displaystyle Y_1 = 0
\displaystyle X_2 = 1 \displaystyle Y_2 = 1
In order to give the maximum likelihood estimator, we want to solve

\ell '(\beta ) = 0

for the given data.

First, we rewrite this as

\ell '(\beta ) = f(\beta ) + g,

where

f(\beta ) = -\sum _{i=1}^{n} X_ i \frac{1}{1 + \mathbf e^{-X_ i \beta }}.

and g is some appropriate value.

What is the range of \, f(\beta ) \,?

\mathbb {R}

\mathbb {R}_{< 0} = \{ r \in \mathbb {R}: r < 0\}

(-1,0), the unit open interval

\{ -1,0\}, the set containing two values, -1 and 0
correct
What is g?

1
correct
What can you conclude about the solution \, \beta \,?

\, \beta = 1 \,.

\, \beta = 0 \,.

There is no \, \beta \, that solves \, \ell '(\beta ) = 0 \,.

All \, \beta \in \mathbb {R} \, solve \, \ell '(\beta ) = 0 \,.
correct
SaveSave your answer
Submit
You have used 1 of 3 attemptsSome problems have options such as save, reset, hints, or show answer. These options follow the Submit button.
(c)
5 points possible (graded)
The problem you encountered in part (b) is called separation . It occurs when the \, Y_ i \, can be perfectly recovered by a linear classifier, i.e., when there is a \, \beta \, such that

\displaystyle X_ i \beta > 0 \implies {} \displaystyle Y_ i = 1,
\displaystyle X_ i \beta < 0 \implies {} \displaystyle Y_ i = 0.
In order to avoid this behavior, one option is to use a prior on \, \beta \,. Let us investigate what happens if we assume that \, \beta \, is drawn from a \, N(0, 1) \, distribution, i.e.,

P(\beta , Y | X) = P(\beta ) \prod _{i=1}^{n} P(Y_ i | X_ i, \beta )

What is the joint log likelihood \, \widetilde{\ell }(\beta ) \, of this Bayesian model? Again, for simplicity, let's plug in (X_1,Y_1) = (0,0) and (X_2,Y_2) = (1,1). (Try to work out the general formula on your own. It will also be provided in the solution.)

\, \widetilde{\ell }(\beta ) =\quad \,
unanswered

Now, we want to find the maximum a posteriori probability estimate, which is obtained by finding \, \beta \, such that \, \widetilde{\ell }(\beta ) = 0 \,. To this end, calculate the first and second derivative \, \widetilde{\ell }'(\beta ) \, and \, \widetilde{\ell }"(\beta ) \,.

\displaystyle \widetilde\ell '(\beta )=\quad
unanswered

\displaystyle \widetilde\ell ^{\prime \prime }(\beta )=\quad
unanswered

What can you conclude about \, \widetilde{\ell }'(\beta ) \,?

\, \widetilde{\ell }' \, is neither increasing nor decreasing on the whole of \, \mathbb {R} \,.

\, \widetilde{\ell }' \, is strictly decreasing.

\, \widetilde{\ell }' \, is strictly increasing.
unanswered
Given the same data as in (b), what can you say about the existence of a solution?

Applying the same arguments as in (b), we see that there is no optimal \, \beta \,.

Modyfing the notation of \, f \, in (b) accordingly, we see that \, f \, now ranges over all of \, \mathbb {R} \,, hence there is a solution.

unanswered

The joint log likelihood of the Bayesian model is given by:

\[
\widetilde{\ell }(\beta ) = \ln(P(\beta )) + \sum _{i=1}^{n} \ln(P(Y_i | X_i, \beta ))
\]

Given the data (X1, Y1) = (0,0) and (X2, Y2) = (1,1), we can plug these values into the formula to find the joint log likelihood.

\[
\widetilde{\ell }(\beta ) = \ln(P(\beta )) + \ln(P(Y1 | X1, \beta )) + \ln(P(Y2 | X2, \beta ))
\]

Since we assume that β is drawn from a N(0,1) distribution, the prior probability P(β) is given by the normal distribution PDF with mean 0 and standard deviation 1. We can denote it as:

\[
P(\beta) = \frac{1}{\sqrt{2\pi}} e^{-\frac{\beta^2}{2}}
\]

Using the given data and the logistic regression model, we can calculate the probabilities P(Y1 | X1, β) and P(Y2 | X2, β). The first term in the joint log likelihood becomes:

\[
\ln(P(\beta )) = \ln\left(\frac{1}{\sqrt{2\pi}} e^{-\frac{\beta^2}{2}}\right)
\]

Now, let's calculate the first derivative of the joint log likelihood with respect to β:

\[
\widetilde{\ell }'(\beta) = \frac{d\widetilde{\ell }(\beta)}{d\beta}
\]

This can be obtained by differentiating each term in the joint log likelihood equation and summing them up. Similarly, we can calculate the second derivative:

\[
\widetilde{\ell }''(\beta) = \frac{d^2\widetilde{\ell }(\beta)}{d\beta^2}
\]

These derivatives will help us find the maximum a posteriori probability estimate by solving the equation \(\widetilde{\ell }'(\beta ) = 0\).

Please wait a moment while I calculate the derivatives.