Standard Deviation

Morgan Chase

30 Standard Deviation

You probably won’t need a calculator for this module.

This topic requires a leap of faith. It is one of the rare times when this textbook will say “don’t worry about why it’s true; just accept it.” 😂

Normal Distributions & Standard Deviation

horizontal bar graph with bars labeled 4, 18, 36, 18, 4 — This normal distribution of Wordle scores is formatted horizontally.

A normal distribution, often referred to as a bell curve, is symmetrical on the left and right, with the mean, median, and mode being the value in the center. There are lots of data values near the center, then fewer and fewer as the values get further from the center. A normal distribution describes the data in many real-world situations.

One of the best ways to demonstrate the normal distribution is to drop balls through a board of evenly spaced pegs, as shown here. (The Plinko game on The Price Is Right is a well known example of this.) Each time a ball hits a peg, it has a fifty-fifty chance of going left or right. For most balls, the number of lefts and rights are roughly equal, and the ball lands near the center. Only a few balls have an extremely lopsided number of lefts and rights, so there are not many balls at either end. As you can see, the distribution is not perfect, but it is approximated by the normal curve drawn on the glass.

science museum demonstration showing balls dropped forming a bell curve

The standard deviation is a measure of the spread of the data: data with lots of results close to the mean has a smaller standard deviation, and data with results spaced further from the mean has a larger standard deviation. (In this textbook, you will be given the value of the standard deviation of the data and will never need to calculate it.) The standard deviation is a measuring stick for a particular set of data.

a distribution with a small standard deviation

a distribution with a large standard deviation

The 68-95-99.7 Rule

The 68-95-99.7 rule: In a normal distribution, approximately…

$68\%$ of the numbers are within $1$ standard deviation above or below the mean
$95\%$ of the numbers are within $2$ standard deviations above or below the mean
$99.7\%$ of the numbers are within $3$ standard deviations above or below the mean

This is an empirical rule because it is based on observation of how the world works, rather than being based on a formula.^[1]

Returning to the ball-dropping experiment, let’s assume that the standard deviation is three columns wide.^[2] In the picture below, the green line marks the center of the distribution.

science museum demonstration showing balls dropped forming a bell curve, with vertical lines drawn to show the standard deviations

First, the two red lines are each three columns away from the center, which is one standard deviation above and below the center, so about 68% of the balls will land between the red lines.

Next, the two orange lines are another three columns farther away from the center, which is six columns or two standard deviations above and below the center, so about 95% of the balls will land between the orange lines.

And finally, the two purple lines are another three columns farther away from the center, which is nine columns or three standard deviations above and below the center, so about 99.7% of the balls will land between the purple lines. We can expect that $997$ out of $1,000$ balls will land between the purple lines, leaving only $3$ out of $1,000$ landing beyond the purple lines on either end.

Okay, that was a lot of information. For our purposes, the following restatement of the 68-95-99.7 rule may be more practical.

The 68-95-99.7 rule: In a normal distribution with mean $\mu$ and standard deviation $\sigma$ …

$68\%$ of the numbers are between $\mu-\sigma$ and $\mu+\sigma$
$95\%$ of the numbers are between $\mu-2\sigma$ and $\mu+2\sigma$
$99.7\%$ of the numbers are between $\mu-3\sigma$ and $\mu+3\sigma$

Exercises

The heights of U.S. females are normally distributed. The average height is around $63.5$ inches ( $5$ ft $3.5$ in) and the standard deviation is $3$ inches. Use the 68-95-99.7 rule to fill in the blanks.

About $68\%$ of the women should be between _______ and _______ inches tall.
About $95\%$ of the women should be between _______ and _______ inches tall.
About $99.7\%$ of the women should be between _______ and _______ inches tall.

The heights of U.S. males are normally distributed. The average height is around $69.5$ inches ( $5$ ft $9.5$ in) and the standard deviation is $3$ inches. Use the 68-95-99.7 rule to fill in the blanks.

About $68\%$ of the men should be between _______ and _______ inches tall.
About $95\%$ of the men should be between _______ and _______ inches tall.
About $99.7\%$ of the men should be between _______ and _______ inches tall.

This graph provides another way to think about the distribution of the data.

Because $68\%$ of the data are within one standard deviation of the mean, we have $34\%$ of the data slightly below the mean and $34\%$ slightly above. Moving outwards one more standard deviation in each direction, we have another $13.5\%$ below the mean and another $13.5\%$ above the mean, encompassing a total of $95\%$ of the data. Moving outwards one more standard deviation, we have another $2.35\%$ far below the mean and another $2.35\%$ far above the mean, bringing the total up to $99.7\%$ of the data. This leaves only $0.15\%$ of the data more than three standard deviations below the mean and $0.15\%$ of the data more than three standard deviations above the mean.

Exercises

Around $16\%$ of U.S. males in their forties weigh less than $160$ lb and $16\%$ weigh more than $230$ lb. Assume a normal distribution.^[3]

What percent of U.S. males weigh between $160$ lb and $230$ lb?
What is the average weight? (Hint: think about symmetry.)
What is the standard deviation? (Hint: You have to work backwards to figure this out, but the math isn’t complicated.)
Based on the empirical rule, about $95\%$ of the men should weigh between _______ and _______ pounds.

This version of the graph can help us group the data into general categories. This is not official terminology, but hopefully it gets the point across.

Looking at the middle $68\%$ of the data, $34\%$ could be considered “slightly low” and $34\%$ could be considered “slightly high”. Moving outwards, we have another $13.5\%$ that could be considered “low” and another $13.5\%$ could be considered “high”. Moving outwards again, we have another $2.35\%$ that could be considered “very low” and another $2.35\%$ that could be considered “very high”. Finally, $0.15\%$ of the data could be considered “extremely low” and $0.15\%$ could be considered “extremely high”.

If you are asked only one question about the empirical rule instead of three in a row ( $68\%$ , $95\%$ , $99.7\%$ ), you will most likely be asked about the $95\%$ . This is related to the “ $95\%$ confidence interval” that is often mentioned in relation to statistics. For example, the margin of error for a poll is usually close to two standard deviations.^[4]

Let’s finish up by comparing the performance of three NFL teams at the beginning of this century.

Exercises

The numbers of regular-season games won by the New England Patriots^[5] each NFL season from 2001-2019: $11$ , $9$ , $14$ , $14$ , $10$ , $12$ , $16$ , $11$ , $10$ , $14$ , $13$ , $12$ , $12$ , $12$ , $12$ , $14$ , $13$ , $11$ , $12$ .

The mean number of wins is $12.2$ , and a spreadsheet tells us that the standard deviation is $1.7$ wins.

There is a $95\%$ chance of the Patriots winning between _______ and _______ games in a season.
In 2020, the Patriots won $7$ games. Could you have predicted that based on the data? How many standard deviations from the mean is this number of wins?

The numbers of regular-season games won by the Buffalo Bills^[6] each NFL season from 2001-2019: $3$ , $8$ , $6$ , $9$ , $5$ , $7$ , $7$ , $7$ , $6$ , $4$ , $6$ , $6$ , $6$ , $9$ , $8$ , $7$ , $9$ , $6$ , $10$ .

The mean number of wins is $6.8$ , and a spreadsheet tells us that the standard deviation is $1.7$ wins.

There is a $95\%$ chance of the Bills winning between _______ and _______ games in a season.
In 2020, the Bills won $13$ games. Could you have predicted that based on the data? How many standard deviations from the mean is this number of wins?

The numbers of regular-season games won by the Denver Broncos^[7] each NFL season from 2001-2019: $8$ , $9$ , $10$ , $10$ , $13$ , $9$ , $7$ , $8$ , $8$ , $4$ , $8$ , $13$ , $13$ , $12$ , $12$ , $9$ , $5$ , $6$ , $7$ .

The mean number of wins is $9.1$ , and a spreadsheet tells us that the standard deviation is $2.6$ wins.

There is a $95\%$ chance of the Broncos winning between _______ and _______ games in a season.
In 2020, the Broncos won $5$ games. Could you have predicted that based on the data? How many standard deviations from the mean is this number of wins?

Exercise Answers

Well, there is a formula, $y=\frac{1}{\sqrt{2\pi}}e^{-\frac{x^2}{2}$ , but it was discovered after the fact. ↵
I eyeballed it and it seemed like a reasonable assumption. ↵
Source (PDF file): https://www2.census.gov/library/publications/2010/compendia/statab/130ed/tables/11s0205.pdf ↵
Source: https://en.wikipedia.org/wiki/Standard_deviation ↵
Source: https://www.pro-football-reference.com/teams/nwe/index.htm ↵
Source: https://www.pro-football-reference.com/teams/buf/index.htm ↵
Source: https://www.pro-football-reference.com/teams/den/index.htm ↵

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Normal Distributions & Standard Deviation

The 68-95-99.7 Rule

License

Share This Book