In a given population, n individuals are sampled randomly, with replacement, and each

Question

In a given population, n individuals are sampled randomly, with replacement, and each

sampled individual is asked whether his/her salary is greater than some fixed threshold z.
Assume that the salary of a randomly chosen individual has the exponential distribution
with unknown parameter λ. Asking whether the salary overcomes a given threshold rathen
than directly asking for the salary increases the number people that are willing to answer
and decreases the number of mistakes in the collected answers. Denote by X1, . . . , Xn the
binary responses (Xi ∈ {0, 1}, i = 1, . . . , n) of the n sampled individuals.
1. What is the distribution of the Xi’s ?
2. Let X¯n be the proportion of sampled individuals whose response was 1 (corresponding to Yes). Prove that X¯n is asymptotically normal and compute the asymptotic
variance.
¯ 3. Find a function f such that f(Xn) is a consistent estimator of λ.

¯
4. Prove that f(Xn) is asymptotically normal and compute the asymptotic variance.
5. What equation must z satisfy in order to minimize the asymptotic variance computed in Question 4 ? Write this equation in the form gλ(z) = z, where gλ is a
function that depends on the unknown parameter λ.
6. Let Y1, . . . , Yn be the salaries of the n sampled people.
a) If one could actually observe Y1, . . . , Yn, what would be the statistical model
?
b) In that case, what would be the Fisher information (as a function of the
unknown parameter λ ? Denote it by IY (λ).
c) In the model where only the Xi’s are observed (with fixed threshold z), what
is the Fisher information ? Denote it by IX(λ).
d) Compare IY (λ) and IX(λ): Which one is the largest ? How do you interpret
this fact ?

Answer 1

1. The distribution of Xi's can be modeled using the Bernoulli distribution, where Xi is a random variable that takes on the value 1 with probability p (the probability that the individual's salary is greater than the threshold) and the value 0 with probability 1-p (the probability that the individual's salary is not greater than the threshold).

2. To prove that Xn is asymptotically normal, we can use the Central Limit Theorem. Since Xi's are independent and identically distributed (i.i.d) Bernoulli random variables, their sum, Xn, can be approximated by a normal distribution as n approaches infinity. The mean of this distribution is np (the expected value of a Bernoulli random variable is p), and the variance is np(1-p).

3. To find a function f such that f(Xn) is a consistent estimator of λ, we can use the fact that the expected value of an exponential distribution with parameter λ is 1/λ. Therefore, we can define f(Xn) = 1/Xn to estimate λ.

4. To prove that f(Xn) is asymptotically normal, we can again use the Central Limit Theorem. Since Xn is the sum of i.i.d Bernoulli random variables, f(Xn) = 1/Xn can be approximated by a normal distribution as n approaches infinity. The mean of this distribution is 1/np, and the variance is (1/np^2)/n = 1/(n*p^2).

5. The equation that z must satisfy in order to minimize the asymptotic variance is the derivative of the asymptotic variance with respect to z equal to 0. In this case, the asymptotic variance is equal to 1/(n*p^2). By differentiating it with respect to z and solving for z, we can find the desired equation.

6. a) If one could actually observe Y1, . . . , Yn, the statistical model would be a set of independent and identically distributed exponential random variables, since each salary Y_i follows an exponential distribution.

b) In that case, the Fisher information would be equal to n/λ^2, where λ is the unknown parameter.

c) In the model where only the Xi's are observed, the Fisher information is equal to n*p*(1-p), where p is the probability of a salary being greater than the threshold.

d) Comparing IY(λ) and IX(λ), we can see that IY(λ) is larger than IX(λ). This means that observing the actual salaries Y1, . . . , Yn provides more information about the unknown parameter λ compared to only observing the binary responses Xi's. This is because the actual salaries provide more detailed and continuous data compared to the binary responses.