![]() |
|
Variance EstimateCentral Tendency Measures describe the central values of the sample, whereas the Dispersion Measures describe the data spread in the sample. Both together contributes to the Descriptive Statistics. Population Variance is estimated using the sample variance S2; where S is given by: Do we use N-1 or N? In calculating the sample variance (or the sample mean squared error), you might have noticed that the total of the squared deviations of each observation from its mean is divided by the total number of observations minus one (N-1). You might wonder why N-1? If it was divided simply by N, then the variance would be the average of the squared deviations from the sample mean. This makes sense. However, every text book suggests that the denominator should be N-1.
This is because the population variance is estimated from the sample mean ( So, with N measurements (data points) only N-1 of them are free variables in the calculation of the sample variance. For example, when we know the mean, we need to know only N-1 observations. The missing observation can be reproduced using the remaining N-1 observations and the sample mean. Therefore "N-1" is the number of degrees of freedom of our data.
Sample Variance calculated by using the N-1 degrees of freedom is an unbiased estimator of the population variance. This can be
mathematically proved. By definition an estimate is unbiased if its expected value equals to the parameter it estimates. Mathematically it is:
E[S] =
|


) and from the deviation of each measurement from the sample mean. But if we lacked any one of these measurements (the mean or a single deviation value) we could calculate it from rest of the data.
(Population Variance).