Home / standard deviation c code / 6 Range, Variance, Standard Deviation

6 Range, Variance, Standard Deviation - standard deviation c code


6 Range, Variance, Standard Deviation-standard deviation c code

6 Range, Variance, Standard Deviation
Range, variance, std dev are all measures of dispersion = spread = variability = how spread out the data is
Another measure of dispersion that we will study later is the interquartile range.
These measures of dispersion are for univariate, quantitative data only.
E.g. The example data sets below all have the same mean and median, namely, 7. It's the "spread" or "vari-
ability" that's different-the range of the data points and how far they tend to be from their center.
Data Set A: 3, 5, 6.4, 6.7, 6.8, 7, 7.3, 7.4, 7.7, 8.7, 11
Data Set B: 1.25, 1.3, 1.35, 1.4, 1.4, 1.7, 3.1, 4, 7, 7, 9.7, 12, 12.05, 12.35, 12.4, 12.55, 12.7, 12.75
Data Set C: -16.9,-16.5,-15.9,-15.8,-15.75,-15.7,-15.7,-15.5,-12,
6.65,6.8,6.9,7,7,7.2,7.4,7.8,27,29.4,29.6,29.7,29.75,29.75,29.9,29.9,30
Range
Range = (max data value) - (min data value)
computed the same way for population & sample data
E.g. range of A : 11 - 3 = 8
range of B : 12.75 - 1.25 = 11.5
range of C : 30 - (-16.9) = 46.9
Note that the range is very sensitive to outliers.
Variance
The variance will measure how far the data tends to be from its mean. We will define the sample variance s2 and then the
population variance 2. We need to do a little work before we can define either.
If x is a data point in a sample, then the deviation of x (from the mean) is x - x?.
E.g. For data set A, x? = 7. Thus the deviation of the data point 3 from the mean is 3 - 7 = -4.
The deviation measures how far x is from the mean. You might think that we could use the average deviation to measure how
far the data tends to be from its mean, but there's a problem.
Problem: the sum of all the deviations is always 0.
Solution: use the squares of the deviations to measure how far data points are from x?.
Corwin STAT 200 17
?2011-2020 Stephen Corwin
The sample variance is
s2 = sum of (deviations squared) = (x-x?)2
number of data points-1 n-1
The denominator is n - 1 because it can be shown that if n is used, then for samples that are small compared to the size of the
population, the result is a biased estimator for 2, while using n - 1 makes it unbiased. Again, "unbiased" means that if many
samples are taken and s2 is computed for each one, then we can expect the average value of s2 to be very close to the population
variance.
The variances of data sets A, B, and C are (approximately) 4, 25, and 365, respectively.
The population variance is 2 = (x-?)2
N.
The units of the variance are the units of the data squared.
Larger variance corresponds beautifully to greater variability, but the units are wrong. E.g., if the data points represent the
number of shoes in a man's closet, then the units of the variance are "shoes squared."
We can fix this.
Standard deviation
The standard deviation is the positive square root of the variance:

For population data: = 2
For sample data: s = s2
E.g. The standard deviations of data sets A, B, and C are (approximately) 2.0, 5.0, and 19.1, respectively.
The units of the std dev are the units of the data.
Interpret std dev as giving the average distance of a data point from the mean.
E.g. To estimate how much change people carry around in their pockets or bags, Jake asked five randomly selected
people how much they had with them. The responses were $1.50, $1.70, $1.90, $2.10, and $2.30.
Find the mean, variance, and standard deviation of this sample, and interpret the standard deviation.
x x - x? (x - x?)2
1.50 -0.4 0.16
0.4 = 0.1
1.70 -0.2 0.04 s2 = 5-1
x? = 9.5 = 1.9 1.90 0 0
5
2.10 0.2 0.04 s = 0.1 0.316
2.30 0.2 0.16
(x - x?)2 = 0.4
So the mean is 1.9, the variance is 0.1, and the standard deviation is 0.316 (to three decimal places). The standard deviation
is about 31.6?, so we may say that on average, the amount of change people were carrying varied from the mean by about
31.6?.
E.g. John recorded the weights of 300 randomly chosen apples, while Jane recorded the weights of 300 ran-
domly chosen dogs. Which data set had the higher standard deviation?
The dogs, because the weights of dogs vary much more than the weights of apples.
Corwin STAT 200 18
?2011-2020 Stephen Corwin
Range, variance, standard deviation on the calculator
The TI's 1-Var Stats function, mentioned in the previous section, gives you much more than just the mean and median.
It has two pages of output; you must arrow down to get to the second page. Remember that the calculator does not know
whether you entered population or sample data, so it shows you values computed from both when they're computed
differently.
Corwin STAT 200 19
?2011-2020 Stephen Corwin

How to explain standard deviation? Standard deviation is a statistical measurement of the amount a number varies from the average number in a series. A low standard deviation means that the data is very closely related to the average, thus very reliable. A high standard deviation means that there is a large variance between the data and the statistical average, and is not as reliable.