Let X be a continuous random variable. We know that it takes values between 0 and 6 , but we do not know its distribution or its mean and variance, although we know that its variance is at most 4 . We are interested in estimating the mean of X , which we denote by h . To estimate h , we take n i.i.d. samples X1,…,Xn , which all have the same distribution as X , and compute the sample mean.

Question

Let X be a continuous random variable. We know that it takes values between 0 and 6 , but we do not know its distribution or its mean and variance, although we know that its variance is at most 4 . We are interested in estimating the mean of X , which we denote by h . To estimate h , we take n i.i.d. samples X1,…,Xn , which all have the same distribution as X , and compute the sample mean.

H=1n∑i=1nXi.

1. Express your answers for this part in terms of h and n using standard notation.

a) E[H]=

Given the available information, the smallest upper bound for Var(H) that we can assert/guarantee is:

b) Var(H)≤

2. Calculate the smallest possible value of n such that the standard deviation of H is guaranteed to be at most 0.01.

This minimum value of n is:

3. We would like to be at least 96% sure that our estimate is within 0.02 of the true mean h. Using the Chebyshev inequality, calculate the minimum value of n that will achieve this.

This minimum value of n is:

4. Suppose now that X is uniformly distributed on [h−3,h+3], for some unknown h. Using the Central Limit Theorem, identify the most appropriate expression for a 95% confidence interval for h. You may want to refer to the normal table

Answer 1

1. a) E[H] = E[1/n ∑(i=1 to n) Xi] = (1/n) ∑(i=1 to n) E[Xi] = (1/n) * n * E[Xi] = E[Xi] = h

b) Var(H) = Var(1/n ∑(i=1 to n) Xi) = (1/n^2) ∑(i=1 to n) Var(Xi) = (1/n^2) * n * Var(Xi) = Var(Xi) / n

Since Var(Xi) ≤ 4 (given information),

Var(H) ≤ 4 / n

2. We want the standard deviation of H to be at most 0.01. The standard deviation is the square root of the variance, so:

0.01 ≥ √(Var(H))

Taking square on both sides:

0.0001 ≥ Var(H)

From part (1b), we know that Var(H) ≤ 4 / n. Substituting this value, we have:

0.0001 ≥ 4 / n

Solving for n:

n ≥ 4 / 0.0001 = 40,000

Therefore, the smallest possible value of n is 40,000.

3. The Chebyshev inequality states that for any constant k > 0, the probability that a random variable X deviates from its mean by more than k standard deviations is at most 1/k^2. In this case, we want to be at least 96% sure that our estimate is within 0.02 of the true mean h.

Let k be the number of standard deviations. We want 1/k^2 ≥ 0.04 (96% = 0.96). Solving for k:

k^2 ≤ 1 / 0.04 = 25

Taking the square root on both sides:

k ≤ 5

Since the standard deviation is the square root of the variance, we have:

k ≤ σ(H) / √n

Substituting the given information Var(H) ≤ 4 / n, and σ(H) = √Var(H):

5 ≤ √(4/n) / √n

Squaring both sides:

25 ≤ 4/n

Solving for n:

n ≤ 4/25 = 0.16

Rounding up to the nearest whole number, the minimum value of n is 1.

4. If X is uniformly distributed on [h - 3, h + 3], for some unknown h, the sample mean H will asymptotically follow a normal distribution with mean h and variance 36/n (according to the Central Limit Theorem).

A 95% confidence interval for h is given by:

H ± (1.96 * √(Var(H)))

Since Var(H) = 36/n, the confidence interval becomes:

H ± (1.96 * √(36/n))

Substituting H = h, we obtain:

h ± (1.96 * √(36/n))

This is the most appropriate expression for a 95% confidence interval for h.

Answer 2

1.

a) To find the expected value of H (E[H]), we can use the linearity of expectations. Since the Xi's are independent and identically distributed (i.i.d.), the expected value of the sample mean is the same as the expected value of each individual random variable. Therefore,

E[H] = E[X1] = h.

b) To find the upper bound for Var(H), we know that the variance of each Xi is at most 4. The variance of H can be calculated as follows:

Var(H) = Var(1/n * (X1 + X2 + ... + Xn)) = (1/n^2) * (Var(X1) + Var(X2) + ... + Var(Xn)).

Since all the Xi's have the same distribution, we can rewrite Var(H) as:

Var(H) = (1/n^2) * (n * Var(X)) = Var(X) / n.

Given that Var(X) ≤ 4, we can conclude that:

Var(H) ≤ 4 / n.

2.

To guarantee that the standard deviation of H is at most 0.01, we need to ensure that Var(H) ≤ (0.01)^2. From part 1b, we have:

Var(H) ≤ 4 / n ≤ (0.01)^2.

Simplifying the inequality, we get:

4 / n ≤ 0.0001.

Solving for n, we find:

n ≥ 4 / 0.0001.

Therefore, the minimum value of n is 40,000.

3.

Using the Chebyshev inequality, we want to find the minimum value of n that ensures that the probability that H deviates from h by more than 0.02 is less than or equal to 4% (1 - 0.96).

Chebyshev's inequality states that:

P(|H - h| ≥ c) ≤ Var(H) / c^2.

We want to find the minimum value of n such that:

Var(H) / (0.02)^2 ≤ 0.04.

From part 1b, we have:

4 / n ≤ 0.04.

Solving for n, we find:

n ≥ 4 / 0.04.

Therefore, the minimum value of n is 100.

4.

If X is uniformly distributed on [h - 3, h + 3], and we have a large enough sample size, we can use the Central Limit Theorem to approximate the sample mean H as a normally distributed random variable.

According to the Central Limit Theorem, when n is large enough (e.g., n ≥ 30) and the underlying population distribution is not too skewed, the distribution of H can be approximated by a normal distribution with mean h and variance Var(H) = Var(X) / n.

To construct a 95% confidence interval for h, we can use the fact that the mean of the normal distribution is the point estimate (H) and the standard deviation is the standard error (SE):

SE = sqrt(Var(H)) = sqrt(Var(X) / n) = sqrt(4 / n).

Since we want a 95% confidence interval, we can use the Z-score associated with a 95% confidence level (Z = 1.96) and the point estimate H to calculate the margin of error (ME):

ME = Z * SE = 1.96 * sqrt(4 / n).

Therefore, the 95% confidence interval for h is:

[h - ME, h + ME] = [H - 1.96 * sqrt(4 / n), H + 1.96 * sqrt(4 / n)].

Note: The normal table can be used to find the cumulative probability associated with the Z-score to determine the appropriate confidence level.