30 Standard Deviation
You probably won’t need a calculator for this module.
This topic requires a leap of faith. It is one of the rare times when this textbook will say “don’t worry about why it’s true; just accept it.” 😂
Normal Distributions & Standard Deviation

A normal distribution, often referred to as a bell curve, is symmetrical on the left and right, with the mean, median, and mode being the value in the center. There are lots of data values near the center, then fewer and fewer as the values get further from the center. A normal distribution describes the data in many real-world situations.
One of the best ways to demonstrate the normal distribution is to drop balls through a board of evenly spaced pegs, as shown here. (The Plinko game on The Price Is Right is a well known example of this.) Each time a ball hits a peg, it has a fifty-fifty chance of going left or right. For most balls, the number of lefts and rights are roughly equal, and the ball lands near the center. Only a few balls have an extremely lopsided number of lefts and rights, so there are not many balls at either end. As you can see, the distribution is not perfect, but it is approximated by the normal curve drawn on the glass.
The standard deviation is a measure of the spread of the data: data with lots of results close to the mean has a smaller standard deviation, and data with results spaced further from the mean has a larger standard deviation. (In this textbook, you will be given the value of the standard deviation of the data and will never need to calculate it.) The standard deviation is a measuring stick for a particular set of data.


The 68-95-99.7 Rule

The 68-95-99.7 rule: In a normal distribution, approximately…
of the numbers are within
standard deviation above or below the mean
of the numbers are within
standard deviations above or below the mean
of the numbers are within
standard deviations above or below the mean
This is an empirical rule because it is based on observation of how the world works, rather than being based on a formula.[1]
Returning to the ball-dropping experiment, let’s assume that the standard deviation is three columns wide.[2] In the picture below, the green line marks the center of the distribution.
First, the two red lines are each three columns away from the center, which is one standard deviation above and below the center, so about 68% of the balls will land between the red lines.
Next, the two orange lines are another three columns farther away from the center, which is six columns or two standard deviations above and below the center, so about 95% of the balls will land between the orange lines.
And finally, the two purple lines are another three columns farther away from the center, which is nine columns or three standard deviations above and below the center, so about 99.7% of the balls will land between the purple lines. We can expect that out of
balls will land between the purple lines, leaving only
out of
landing beyond the purple lines on either end.
Okay, that was a lot of information. For our purposes, the following restatement of the 68-95-99.7 rule may be more practical.
The 68-95-99.7 rule: In a normal distribution with mean and standard deviation
…
of the numbers are between
and
of the numbers are between
and
of the numbers are between
and
Exercises




- About
of the women should be between _______ and _______ inches tall.
- About
of the women should be between _______ and _______ inches tall.
- About
of the women should be between _______ and _______ inches tall.
The heights of U.S. males are normally distributed. The average height is around inches (
ft
in) and the standard deviation is
inches. Use the 68-95-99.7 rule to fill in the blanks.
- About
of the men should be between _______ and _______ inches tall.
- About
of the men should be between _______ and _______ inches tall.
- About
of the men should be between _______ and _______ inches tall.
This graph provides another way to think about the distribution of the data.
Because of the data are within one standard deviation of the mean, we have
of the data slightly below the mean and
slightly above. Moving outwards one more standard deviation in each direction, we have another
below the mean and another
above the mean, encompassing a total of
of the data. Moving outwards one more standard deviation, we have another
far below the mean and another
far above the mean, bringing the total up to
of the data. This leaves only
of the data more than three standard deviations below the mean and
of the data more than three standard deviations above the mean.
Exercises
Around of U.S. males in their forties weigh less than
lb and
weigh more than
lb. Assume a normal distribution.[3]
- What percent of U.S. males weigh between
lb and
lb?
- What is the average weight? (Hint: think about symmetry.)
- What is the standard deviation? (Hint: You have to work backwards to figure this out, but the math isn’t complicated.)
- Based on the empirical rule, about
of the men should weigh between _______ and _______ pounds.
This version of the graph can help us group the data into general categories. This is not official terminology, but hopefully it gets the point across.
Looking at the middle of the data,
could be considered “slightly low” and
could be considered “slightly high”. Moving outwards, we have another
that could be considered “low” and another
could be considered “high”. Moving outwards again, we have another
that could be considered “very low” and another
that could be considered “very high”. Finally,
of the data could be considered “extremely low” and
could be considered “extremely high”.
If you are asked only one question about the empirical rule instead of three in a row (,
,
), you will most likely be asked about the
. This is related to the “
confidence interval” that is often mentioned in relation to statistics. For example, the margin of error for a poll is usually close to two standard deviations.[4]
Let’s finish up by comparing the performance of three NFL teams at the beginning of this century.
Exercises
The numbers of regular-season games won by the New England Patriots[5] each NFL season from 2001-2019: ,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
.
The mean number of wins is , and a spreadsheet tells us that the standard deviation is
wins.
- There is a
chance of the Patriots winning between _______ and _______ games in a season.
- In 2020, the Patriots won
games. Could you have predicted that based on the data? How many standard deviations from the mean is this number of wins?
The numbers of regular-season games won by the Buffalo Bills[6] each NFL season from 2001-2019: ,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
.
The mean number of wins is , and a spreadsheet tells us that the standard deviation is
wins.
- There is a
chance of the Bills winning between _______ and _______ games in a season.
- In 2020, the Bills won
games. Could you have predicted that based on the data? How many standard deviations from the mean is this number of wins?
The numbers of regular-season games won by the Denver Broncos[7] each NFL season from 2001-2019: ,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
.
The mean number of wins is , and a spreadsheet tells us that the standard deviation is
wins.
- There is a
chance of the Broncos winning between _______ and _______ games in a season.
- In 2020, the Broncos won
games. Could you have predicted that based on the data? How many standard deviations from the mean is this number of wins?
- Well, there is a formula,
, but it was discovered after the fact. ↵
- I eyeballed it and it seemed like a reasonable assumption. ↵
- Source (PDF file): https://www2.census.gov/library/publications/2010/compendia/statab/130ed/tables/11s0205.pdf ↵
- Source: https://en.wikipedia.org/wiki/Standard_deviation ↵
- Source: https://www.pro-football-reference.com/teams/nwe/index.htm ↵
- Source: https://www.pro-football-reference.com/teams/buf/index.htm ↵
- Source: https://www.pro-football-reference.com/teams/den/index.htm ↵