The standard deviation is similar to the mean deviation in that here too the deviations are measured from the mean.
At the same time, the standard deviation is preferred to the mean deviation or the quartile deviation or the range because it has desirable mathematical properties.
Before defining the concept of the standard deviation, we introduce another concept viz. variance.
Example 1:
X |
X-m |
(X-m)2 |
20 |
20-18=12 |
4 |
15 |
15-18= -3 |
9 |
19 |
19-18 = 1 |
1 |
24 |
24-18 = 6 |
36 |
16 |
16-18 = -2 |
4 |
14 |
14-18 = -4 |
16 |
108 |
Total |
70 |
Solution:
The second column shows the deviations from the mean. The third or the last column shows the squared deviations, the sum of which is 70. The arithmetic mean of the squared deviations is:
= 70/6=11.67 approx.
This mean of the squared deviations is known as the variance. It may be noted that this variance is described by different terms that are used interchangeably: the variance of the distribution X; the variance of X; the variance of the distribution; and just simply, the variance.
Where s2 (called sigma squared) is used to denote the variance.
Although the variance is a measure of dispersion, the unit of its measurement is (points). If a distribution relates to income of families then the variance is (Rs)2 and not rupees.
Similarly, if another distribution pertains to marks of students, then the unit of variance is (marks)2. To overcome this inadequacy, the square root of variance is taken, which yields a better measure of dispersion known as the standard deviation. Taking our earlier example of individual observations, we take the square root of the variance
In applied Statistics, the standard deviation is more frequently used than the variance. This can also be written as:
We use this formula to calculate the standard deviation from the individual observations given earlier.
Example 2:
X |
X2 |
20 |
400 |
15 |
225 |
19 |
361 |
24 |
576 |
16 |
256 |
14 |
196 |
108 |
2014 |
Solution:
Example 3:
The following distribution relating to marks obtained by students in an examination:
Marks |
Number of Students |
0- 10 |
1 |
10- 20 |
3 |
20- 30 |
6 |
30- 40 |
10 |
40- 50 |
12 |
50- 60 |
11 |
60- 70 |
6 |
70- 80 |
3 |
80- 90 |
2 |
90-100 |
1 |
Solution:
Marks |
Frequency (f) |
Mid-points |
Deviations (d)/10=d’ |
Fd’ |
fd'2 |
0- 10 |
1 |
5 |
-5 |
-5 |
25 |
10- 20 |
3 |
15 |
-4 |
-12 |
48 |
20- 30 |
6 |
25 |
-3 |
-18 |
54 |
30- 40 |
10 |
35 |
-2 |
-20 |
40 |
40- 50 |
12 |
45 |
-1 |
-12 |
12 |
50- 60 |
11 |
55 |
0 |
0 |
0 |
60- 70 |
6 |
65 |
1 |
6 |
6 |
70- 80 |
3 |
75 |
2 |
6 |
12 |
80- 90 |
2 |
85 |
3 |
6 |
18 |
90-100 |
1 |
95 |
4 |
4 |
16 |
Total |
55 |
|
Total |
-45 |
231 |
In the case of frequency distribution where the individual values are not known, we use the midpoints of the class intervals. Thus, the formula used for calculating the standard deviation is as given below:
Where mi is the mid-point of the class intervals m is the mean of the distribution, fi is the frequency of each class; N is the total number of frequency and K is the number of classes. This formula requires that the mean m be calculated and that deviations (mi -μ) be obtained for each class. To avoid this inconvenience, the above formula can be modified as:
Where C is the class interval: fi is the frequency of the ith class and di is the deviation of the of item from an assumed origin; and N is the total number of observations.
Applying this formula for the table given earlier,
=18.8 marks
When it becomes clear that the actual mean would turn out to be in fraction, calculating deviations from the mean would be too cumbersome. In such cases, an assumed mean is used and the deviations from it are calculated. While mid- point of any class can be taken as an assumed mean, it is advisable to choose the mid-point of that class that would make calculations least cumbersome. Guided by this consideration, in Example 3.7 we have decided to choose 55 as the mid-point and, accordingly, deviations have been taken from it. It will be seen from the calculations that they are considerably simplified.
Uses of the Standard Deviation
The standard deviation is a frequently used measure of dispersion. It enables us to determine as to how far individual items in a distribution deviate from its mean. In a symmetrical, bell-shaped curve:
- About 68 percent of the values in the population fall within: + 1 standard deviation from the mean.
- About 95 percent of the values will fall within +2 standard deviations from the
- About 99 percent of the values will fall within + 3 standard deviations from the
The standard deviation is an absolute measure of dispersion as it measures variation in the same units as the original data. As such, it cannot be a suitable measure while comparing two or more distributions.
For this purpose, we should use a relative measure of dispersion. One such measure of relative dispersion is the coefficient of variation, which relates the standard deviation and the mean such that the standard deviation is expressed as a percentage of mean. Thus, the specific unit in which the standard deviation is measured is done away with and the new unit becomes percent.
Symbolically, CV (coefficient of variation) = s x 100 m
Example 4:
In a small business firm, two typists are employed-typist A and typist
Typist A types out, on an average, 30 pages per day with a standard deviation of 6.
Typist B, on an average, types out 45 pages with a standard deviation of 10. Which typist shows greater consistency in his output?
These calculations clearly indicate that although typist B types out more pages, there is a greater variation in his output as compared to that of typist A.
We can say this in a different way: Though typist A's daily output is much less, he is more consistent than typist B. The usefulness of the coefficient of variation becomes clear in comparing two groups of data having different means, as has been the case in the above example.
Standardized Variable, Standard Scores
The variable Z = (x - x )/s or (x - m)/m, which measures the deviation from the mean in units of the standard deviation, is called a standardised variable.
Since both the numerator and the denominator are in the same units, a standardised variable is independent of units used. If deviations from the mean are given in units of the standard deviation, they are said to be expressed in standard units or standard scores.
Through this concept of standardised variable, proper comparisons can be made between individual observations belonging to two different distributions whose compositions differ.
Example 5:
A student has scored 68 marks in Statistics for which the average marks were 60 and the standard deviation was 10.
In the paper on Marketing, he scored 74 marks for which the average marks were 68 and the standard deviation was In which paper, Statistics or Marketing, was his relative standing higher?
Solution:
The standardised variable Z = (x - x ) ¸ s measures the deviation of x from the mean x in terms of standard deviation s.
For Statistics, Z = (68 - 60) ¸ 10 = 0.8 For Marketing, Z = (74 - 68) ¸ 15 = 0.4
Since the standard score is 0.8 in Statistics as compared to 0.4 in Marketing, his relative standing was higher in Statistics.
Example 6:
Convert the set of numbers 6, 7, 5, 10 and 12 into standard scores:
Thus the standard scores for 6,7,5,10 and 12 are -0.77, -0.38, -1.15, 0.77 and 1.53, respectively.