30 Standard Deviation
You probably won’t need a calculator for this module.
This topic requires a leap of faith. It is one of the rare times when this textbook will say “don’t worry about why it’s true; just accept it.” 😂
Normal Distributions & Standard Deviation
A normal distribution, often referred to as a bell curve, is symmetrical on the left and right, with the mean, median, and mode being the value in the center. There are lots of data values near the center, then fewer and fewer as the values get further from the center. A normal distribution describes the data in many real-world situations.
One of the best ways to demonstrate the normal distribution is to drop balls through a board of evenly spaced pegs, as shown here. (The Plinko game on The Price Is Right is a well known example of this.) Each time a ball hits a peg, it has a fifty-fifty chance of going left or right. For most balls, the number of lefts and rights are roughly equal, and the ball lands near the center. Only a few balls have an extremely lopsided number of lefts and rights, so there are not many balls at either end. As you can see, the distribution is not perfect, but it is approximated by the normal curve drawn on the glass.
The standard deviation is a measure of the spread of the data: data with lots of results close to the mean has a smaller standard deviation, and data with results spaced further from the mean has a larger standard deviation. (In this textbook, you will be given the value of the standard deviation of the data and will never need to calculate it.) The standard deviation is a measuring stick for a particular set of data.
The 68-95-99.7 Rule
The 68-95-99.7 rule: In a normal distribution, approximately…
- of the numbers are within standard deviation above or below the mean
- of the numbers are within standard deviations above or below the mean
- of the numbers are within standard deviations above or below the mean
This is an empirical rule because it is based on observation of how the world works, rather than being based on a formula.[1]
Returning to the ball-dropping experiment, let’s assume that the standard deviation is three columns wide.[2] In the picture below, the green line marks the center of the distribution.
First, the two red lines are each three columns away from the center, which is one standard deviation above and below the center, so about 68% of the balls will land between the red lines.
Next, the two orange lines are another three columns farther away from the center, which is six columns or two standard deviations above and below the center, so about 95% of the balls will land between the orange lines.
And finally, the two purple lines are another three columns farther away from the center, which is nine columns or three standard deviations above and below the center, so about 99.7% of the balls will land between the purple lines. We can expect that out of balls will land between the purple lines, leaving only out of landing beyond the purple lines on either end.
Okay, that was a lot of information. For our purposes, the following restatement of the 68-95-99.7 rule may be more practical.
The 68-95-99.7 rule: In a normal distribution with mean and standard deviation …
- of the numbers are between and
- of the numbers are between and
- of the numbers are between and
Exercises
- About of the women should be between _______ and _______ inches tall.
- About of the women should be between _______ and _______ inches tall.
- About of the women should be between _______ and _______ inches tall.
The heights of U.S. males are normally distributed. The average height is around inches ( ft in) and the standard deviation is inches. Use the 68-95-99.7 rule to fill in the blanks.
- About of the men should be between _______ and _______ inches tall.
- About of the men should be between _______ and _______ inches tall.
- About of the men should be between _______ and _______ inches tall.
This graph provides another way to think about the distribution of the data.
Because of the data are within one standard deviation of the mean, we have of the data slightly below the mean and slightly above. Moving outwards one more standard deviation in each direction, we have another below the mean and another above the mean, encompassing a total of of the data. Moving outwards one more standard deviation, we have another far below the mean and another far above the mean, bringing the total up to of the data. This leaves only of the data more than three standard deviations below the mean and of the data more than three standard deviations above the mean.
Exercises
Around of U.S. males in their forties weigh less than lb and weigh more than lb. Assume a normal distribution.[3]
- What percent of U.S. males weigh between lb and lb?
- What is the average weight? (Hint: think about symmetry.)
- What is the standard deviation? (Hint: You have to work backwards to figure this out, but the math isn’t complicated.)
- Based on the empirical rule, about of the men should weigh between _______ and _______ pounds.
This version of the graph can help us group the data into general categories. This is not official terminology, but hopefully it gets the point across.
Looking at the middle of the data, could be considered “slightly low” and could be considered “slightly high”. Moving outwards, we have another that could be considered “low” and another could be considered “high”. Moving outwards again, we have another that could be considered “very low” and another that could be considered “very high”. Finally, of the data could be considered “extremely low” and could be considered “extremely high”.
If you are asked only one question about the empirical rule instead of three in a row (, , ), you will most likely be asked about the . This is related to the “ confidence interval” that is often mentioned in relation to statistics. For example, the margin of error for a poll is usually close to two standard deviations.[4]
Let’s finish up by comparing the performance of three NFL teams at the beginning of this century.
Exercises
The numbers of regular-season games won by the New England Patriots[5] each NFL season from 2001-2019: , , , , , , , , , , , , , , , , , , .
The mean number of wins is , and a spreadsheet tells us that the standard deviation is wins.
- There is a chance of the Patriots winning between _______ and _______ games in a season.
- In 2020, the Patriots won games. Could you have predicted that based on the data? How many standard deviations from the mean is this number of wins?
The numbers of regular-season games won by the Buffalo Bills[6] each NFL season from 2001-2019: , , , , , , , , , , , , , , , , , , .
The mean number of wins is , and a spreadsheet tells us that the standard deviation is wins.
- There is a chance of the Bills winning between _______ and _______ games in a season.
- In 2020, the Bills won games. Could you have predicted that based on the data? How many standard deviations from the mean is this number of wins?
The numbers of regular-season games won by the Denver Broncos[7] each NFL season from 2001-2019: , , , , , , , , , , , , , , , , , , .
The mean number of wins is , and a spreadsheet tells us that the standard deviation is wins.
- There is a chance of the Broncos winning between _______ and _______ games in a season.
- In 2020, the Broncos won games. Could you have predicted that based on the data? How many standard deviations from the mean is this number of wins?
- Well, there is a formula, , but it was discovered after the fact. ↵
- I eyeballed it and it seemed like a reasonable assumption. ↵
- Source (PDF file): https://www2.census.gov/library/publications/2010/compendia/statab/130ed/tables/11s0205.pdf ↵
- Source: https://en.wikipedia.org/wiki/Standard_deviation ↵
- Source: https://www.pro-football-reference.com/teams/nwe/index.htm ↵
- Source: https://www.pro-football-reference.com/teams/buf/index.htm ↵
- Source: https://www.pro-football-reference.com/teams/den/index.htm ↵