# Module 30: Standard Deviation

This topic requires a leap of faith. It is one of the rare times when this textbook will say “don’t worry about why it’s true; just accept it.”

A normal distribution, often referred to as a bell curve, is symmetrical on the left and right, with the mean, median, and mode being the value in the center. There are lots of data values near the center, then fewer and fewer as the values get further from the center. A normal distribution describes the data in many real-world situations: heights of people, weights of people, errors in measurement, scores on standardized tests (IQ, SAT, ACT)…

One of the best ways to demonstrate the normal distribution is to drop balls through a board of evenly spaced pegs, as shown here.[1] Each time a ball hits a peg, it has a fifty-fifty chance of going left or right. For most balls, the number of lefts and rights are roughly equal, and the ball lands near the center. Only a few balls have an extremely lopsided number of lefts and rights, so there are not many balls at either end. As you can see, the distribution is not perfect, but it is approximated by the normal curve drawn on the glass.

The standard deviation is a measure of the spread of the data: data with lots of numbers close to the mean has a smaller standard deviation, and data with numbers spaced further from the mean has a larger standard deviation. (In this textbook, you will be given the value of the standard deviation of the data and will never need to calculate it.) The standard deviation is a measuring stick for a particular set of data.

In a normal distribution…

• roughly of the numbers are within standard deviation above or below the mean
• roughly of the numbers are within standard deviations above or below the mean
• roughly of the numbers are within standard deviations above or below the mean

This 68-95-99.7 rule is called an empirical rule because it is based on observation rather than some formula. Nobody discovered a calculation to figure out the numbers , , and until after the fact. Instead, statisticians looked at lots of different examples of normally distributed data and said “Mon Dieu, it appears that if you count up the data values that are within one standard deviation above or below the mean, you have about of the data!” and so on.[2]

The following image is in Swedish, but you can probably decipher it because math is an international language.

Let’s go back to the ball-dropping experiment, and let’s assume that the standard deviation is three columns wide.[3] In the picture below, the green line marks the center of the distribution.

First, the two red lines are each three columns away from the center, which is one standard deviation above and below the center, so about 68% of the balls will land between the red lines.

Next, the two orange lines are another three columns farther away from the center, which is six columns or two standard deviations above and below the center, so about 95% of the balls will land between the orange lines.

And finally, the two purple lines are another three columns farther away from the center, which is nine columns or three standard deviations above and below the center, so about 99.7% of the balls will land between the purple lines. We can expect that out of balls will land between the purple lines, leaving only out of landing beyond the purple lines on either end.

Here are Damian Lillard’s game results for points scored, in increasing order, for the games he played in the 2018-19 NBA season.[4] This is broken up into eight rows of ten numbers each, and this is a total of points.

, , , , , , , , , ,
, , , , , , , , , ,
, , , , , , , , , ,
, , , , , , , , , ,
, , , , , , , , , ,
, , , , , , , , , ,
, , , , , , , , , ,
, , , , , , , , ,

Exercises

This is a review of mean, median, and mode; you’ll need to know the mean in order to complete the standard deviation exercises that follow.

1. What is the mean of the data? (Round to the nearest tenth.)

2. What is the median of the data?

3. What is the mode of the data?

4. Do any of the mean, median, or mode seems misleading, or do all three seem to represent the data fairly well?

Here is a histogram of the data, arbitrarily grouped in seven equally-spaced intervals. It shows that the data roughly follows a bell-shaped curve, somewhat truncated on the left and with an outlier on the right.

If we enter the data into a spreadsheet program such as Microsoft Excel or Google Sheets, we can quickly find that the standard deviation is points.

Based on the empirical rule, we should expect approximately of the results to be within points above and below the mean.

Exercises

5. Determine the range of points scored that are within one standard deviation of the mean.

6. How many of the game results are within one standard deviation of the mean?

7. Is the previous answer close to of the total number of game results?

And we should expect approximately of the results to be within points above and below the mean.

Exercises

8. Determine the range of points scored that are within two standard deviations of the mean.

9. How many of the game results are within two standard deviations of the mean?

10. Is the previous answer close to of the total number of game results?

And we should expect approximately of the results to be within points above and below the mean.

Exercises

11. Determine the range of points scored that are within three standard deviations of the mean.

12. How many of the game results are within three standard deviations of the mean?

13. Is the previous answer close to of the total number of game results?

Notice that we could think about the standard deviations like a measurement error or tolerance: the mean , the mean , the mean

Exercises

For U.S. females, the average height is around inches ( ft in) and the standard deviation is inches. Use the empirical rule to fill in the blanks.

14. About of the women should be between  _______ and _______ inches tall.

15. About of the women should be between  _______ and _______ inches tall.

16. About of the women should be between _______ and _______ inches tall.

For U.S. males, the average height is around inches ( ft in) and the standard deviation is inches. Use the empirical rule to fill in the blanks.

17. About of the men should be between  _______ and _______ inches tall.

18. About of the men should be between  _______ and _______ inches tall.

19. About of the men should be between _______ and _______ inches tall.

This graph at https://tall.life/height-percentile-calculator-age-country/ shows that, because the standard deviations are equal, the two bell curves have essentially the same shape but the women’s graph is centered six inches below the men’s.

Exercises

Around of U.S. males in their forties weigh less than lb and weigh more than lb.[5] Assume a normal distribution.

20. What percent of U.S. males weigh between lb and lb?

21. What is the average weight? (Hint: think about symmetry.)

22. What is the standard deviation? (Hint: You have to work backwards to figure this out, but the math isn’t complicated.)

23. Based on the empirical rule, about of the men should weigh between _______ and _______ pounds.

If you are asked only one question about the empirical rule instead of three in a row (, , ), you will most likely be asked about the . This is related to the “ confidence interval” that is often mentioned in relation to statistics. For example, the margin of error for a poll is usually close to two standard deviations.[6]

Let’s finish up by comparing the performance of three NFL teams since the turn of the century.

The numbers of regular-season games won by the New England Patriots each NFL season from 2001-19:[7]

 year wins 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019

Exercises

For the Patriots, the mean number of wins is , and a spreadsheet tells us that the standard deviation is wins.

24. There is a chance of the Patriots winning between _______ and _______ games in a season.

25. In 2020, the Patriots won games. Could you have predicted that based on the data? How many standard deviations from the mean is this number of wins?

The numbers of regular-season games won by the Buffalo Bills each NFL season from 2001-19:[8]

 year wins 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019

Exercises

For the Bills, the mean number of wins is , and a spreadsheet tells us that the standard deviation is wins.

26. There is a chance of the Bills winning between _______ and _______ games in a season.

27. In 2020, the Bills won games. Could you have predicted that based on the data? How many standard deviations from the mean is this number of wins?

The numbers of regular-season games won by the Denver Broncos each NFL season from 2001-19:[9]

 year wins 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019

Exercises

For the Broncos, the mean number of wins is , and a spreadsheet tells us that the standard deviation is wins.

28. There is a chance of the Broncos winning between _______ and _______ games in a season.

29. In 2020, the Broncos won games. Could you have predicted that based on the data? How many standard deviations from the mean is this number of wins?