posted by mathstudent on .
sigma is the standard deviation of a population of size N
S is the standard deviation of a sample of size n from within the population.
What is the estimated value of S^2?
If the population was infinitely large (size N = infinity), what would the estimated value of S^2 be?
I meant "expected" value, not "estimated" value. Sorry about that.
As far as I know (statistics is not really my specialty), the sigma of the sample depends only on the sample size, not the size of the population the sample is chosen from.
Therefore in my ignorance I would say:
S^2 sample = sigma^2 of population / n
sigma sample = sigma population/sqrt(n)
Damon, that can't be right. As n approaches infinity, S^2 should approach sigma^2.
Also, the wikipedia entry does use both sample size and population size in their formula which is one reason that I wanted to see it derived.
You are right. I think I have that backwards and it is too simple anyway. I hope a statistics expert comes by here.
You can formally write down everything in terms of the probability distribution function. Let's say the population consists of N elements and each element can be in some state denoted by a continuous variable x distributed according to the same probability density
If all the variables are independent, the joint probability distribution factorizes:
p(x1, x2, ...,xN) = p(x1)p(x2)...p(xN)
If you measure S^2, you take n of the variables xi, say, x1 , x2, ...xn and compute the standard deviation in the usual way:
S^2 = <x^2> - <x>^2
where <x> and <x^2> denote averages of the n numbers x1, x2, ..., xn, not an average using p(x) as S^2 depends on the actual numbers in the sample. So, we don't know what S^2 will be, but we can compute the probability distribution, expectation value etc. of S^2 in terms of the function p(x).
The expectation value is given by:
Integral dx1 dx2...dxn
p(x1)p(x2)...p(xn)[<x^2> - <x>^2].
Insert in here:
<x^2> = 1/n(x1^2 + x2^2 + ...+xn^2)
<x>^2 = 1/n^2(x1 + x2 + ...+xn)^2 =
1/n^2(x1^2 + x2^2 + ...+xn^2) +
1/n^2 (sum over xi xj for i not equal to j)
Now let's compute the integrations. Let's use the notation <<f(x)>> for an average relative to p(x). So
<<x>> means Integral dx x p(x)
Then you see that you only got terms like <<x>>^2 and <<x^2>> and you just need to count how many of each and what the prefactors are.
You should find:
<<S^2>> = (1-1/n) [<<x^2>> - <<x>>^2]
Now, the term in the square brackets is the sigma^2 if the sample size is infinite (because if you sample over an infinite sample size you are computing the exact average which is also given by the integral over the probability distribution).
The factor(1-1/n) explains why when estimating the standard deviation from a finite sample you have a term 1/(n-1) in the denominator in the aquare root instead of 1/n. S^2 will, on average be the true standard deviation times
(1-1/n), so you divide by this factor, i.e. multiply by n/(n-1)
Thank you !