30 Standard Deviation

You probably won’t need a calculator for this module.

This topic requires a leap of faith. It is one of the rare times when this textbook will say “don’t worry about why it’s true; just accept it.” 😂

Normal Distributions & Standard Deviation

horizontal bar graph with bars labeled 4, 18, 36, 18, 4
This normal distribution of Wordle scores is formatted horizontally.

A normal distribution, often referred to as a bell curve, is symmetrical on the left and right, with the mean, median, and mode being the value in the center. There are lots of data values near the center, then fewer and fewer as the values get further from the center. A normal distribution describes the data in many real-world situations.

One of the best ways to demonstrate the normal distribution is to drop balls through a board of evenly spaced pegs, as shown here. (The Plinko game on The Price Is Right is a well known example of this.) Each time a ball hits a peg, it has a fifty-fifty chance of going left or right. For most balls, the number of lefts and rights are roughly equal, and the ball lands near the center. Only a few balls have an extremely lopsided number of lefts and rights, so there are not many balls at either end. As you can see, the distribution is not perfect, but it is approximated by the normal curve drawn on the glass.

science museum demonstration showing balls dropped forming a bell curve

The standard deviation is a measure of the spread of the data: data with lots of results close to the mean has a smaller standard deviation, and data with results spaced further from the mean has a larger standard deviation. (In this textbook, you will be given the value of the standard deviation of the data and will never need to calculate it.) The standard deviation is a measuring stick for a particular set of data.

a distribution with a small standard deviation
a distribution with a large standard deviation

The 68-95-99.7 Rule

a bell curve showing the 68-95-99.7 rule: matching vertical lines one unit to the left and right of the center include 68 percent of the data, matching vertical lines two units to the left and right of the center include 95 percent of the data, and matching vertical lines three units to the left and right of the center include 99.7 percent of the data.
The 68-95-99.7 rule, in Swedish. Image credit: Svjo, hosted at Wikimedia Commons.

The 68-95-99.7 rule: In a normal distribution, approximately…

  • 68\% of the numbers are within 1 standard deviation above or below the mean
  • 95\% of the numbers are within 2 standard deviations above or below the mean
  • 99.7\% of the numbers are within 3 standard deviations above or below the mean

This is an empirical rule because it is based on observation of how the world works, rather than being based on a formula.[1]

Returning to the ball-dropping experiment, let’s assume that the standard deviation is three columns wide.[2] In the picture below, the green line marks the center of the distribution.

science museum demonstration showing balls dropped forming a bell curve, with vertical lines drawn to show the standard deviations

First, the two red lines are each three columns away from the center, which is one standard deviation above and below the center, so about 68% of the balls will land between the red lines.

Next, the two orange lines are another three columns farther away from the center, which is six columns or two standard deviations above and below the center, so about 95% of the balls will land between the orange lines.

And finally, the two purple lines are another three columns farther away from the center, which is nine columns or three standard deviations above and below the center, so about 99.7% of the balls will land between the purple lines. We can expect that 997 out of 1,000 balls will land between the purple lines, leaving only 3 out of 1,000 landing beyond the purple lines on either end.


Okay, that was a lot of information. For our purposes, the following restatement of the 68-95-99.7 rule may be more practical.

The 68-95-99.7 rule: In a normal distribution with mean \mu and standard deviation \sigma

  • 68\% of the numbers are between \mu-\sigma and \mu+\sigma
  • 95\% of the numbers are between \mu-2\sigma and \mu+2\sigma
  • 99.7\% of the numbers are between \mu-3\sigma and \mu+3\sigma

Exercises

The heights of U.S. females are normally distributed. The average height is around 63.5 inches (5 ft 3.5 in) and the standard deviation is 3 inches. Use the 68-95-99.7 rule to fill in the blanks.
  1. About 68\% of the women should be between  _______ and _______ inches tall.
  2. About 95\% of the women should be between  _______ and _______ inches tall.
  3. About 99.7\% of the women should be between  _______ and _______ inches tall.

The heights of U.S. males are normally distributed. The average height is around 69.5 inches (5 ft 9.5 in) and the standard deviation is 3 inches. Use the 68-95-99.7 rule to fill in the blanks.

  1. About 68\% of the men should be between  _______ and _______ inches tall.
  2. About 95\% of the men should be between  _______ and _______ inches tall.
  3. About 99.7\% of the men should be between  _______ and _______ inches tall.

This graph provides another way to think about the distribution of the data.

Because 68\% of the data are within one standard deviation of the mean, we have 34\% of the data slightly below the mean and 34\% slightly above. Moving outwards one more standard deviation in each direction, we have another 13.5\% below the mean and another 13.5\% above the mean, encompassing a total of 95\% of the data. Moving outwards one more standard deviation, we have another 2.35\% far below the mean and another 2.35\% far above the mean, bringing the total up to 99.7\% of the data. This leaves only 0.15\% of the data more than three standard deviations below the mean and 0.15\% of the data more than three standard deviations above the mean.

Exercises

Around 16\% of U.S. males in their forties weigh less than 160 lb and 16\% weigh more than 230 lb. Assume a normal distribution.[3]

  1. What percent of U.S. males weigh between 160 lb and 230 lb?
  2. What is the average weight? (Hint: think about symmetry.)
  3. What is the standard deviation? (Hint: You have to work backwards to figure this out, but the math isn’t complicated.)
  4. Based on the empirical rule, about 95\% of the men should weigh between _______ and _______ pounds.

This version of the graph can help us group the data into general categories. This is not official terminology, but hopefully it gets the point across.

Looking at the middle 68\% of the data, 34\% could be considered “slightly low” and 34\% could be considered “slightly high”. Moving outwards, we have another 13.5\% that could be considered “low” and another 13.5\% could be considered “high”. Moving outwards again, we have another 2.35\% that could be considered “very low” and another 2.35\% that could be considered “very high”. Finally, 0.15\% of the data could be considered “extremely low” and 0.15\% could be considered “extremely high”.


If you are asked only one question about the empirical rule instead of three in a row (68\%, 95\%, 99.7\%), you will most likely be asked about the 95\%. This is related to the “95\% confidence interval” that is often mentioned in relation to statistics. For example, the margin of error for a poll is usually close to two standard deviations.[4]

Let’s finish up by comparing the performance of three NFL teams at the beginning of this century.

Exercises

The numbers of regular-season games won by the New England Patriots[5] each NFL season from 2001-2019: 11, 9, 14, 14, 10, 12, 16, 11, 10, 14, 13, 12, 12, 12, 12, 14, 13, 11, 12.

The mean number of wins is 12.2, and a spreadsheet tells us that the standard deviation is 1.7 wins.

  1. There is a 95\% chance of the Patriots winning between _______ and _______ games in a season.
  2. In 2020, the Patriots won 7 games. Could you have predicted that based on the data? How many standard deviations from the mean is this number of wins?

The numbers of regular-season games won by the Buffalo Bills[6] each NFL season from 2001-2019: 3, 8, 6, 9, 5, 7, 7, 7, 6, 4, 6, 6, 6, 9, 8, 7, 9, 6, 10.

The mean number of wins is 6.8, and a spreadsheet tells us that the standard deviation is 1.7 wins.

  1. There is a 95\% chance of the Bills winning between _______ and _______ games in a season.
  2. In 2020, the Bills won 13 games. Could you have predicted that based on the data? How many standard deviations from the mean is this number of wins?

The numbers of regular-season games won by the Denver Broncos[7] each NFL season from 2001-2019: 8, 9, 10, 10, 13, 9, 7, 8, 8, 4, 8, 13, 13, 12, 12, 9, 5, 6, 7.

The mean number of wins is 9.1, and a spreadsheet tells us that the standard deviation is 2.6 wins.

  1. There is a 95\% chance of the Broncos winning between _______ and _______ games in a season.
  2. In 2020, the Broncos won 5 games. Could you have predicted that based on the data? How many standard deviations from the mean is this number of wins?

Exercise Answers


License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Technical Mathematics, 2nd Edition Copyright © 2024 by Morgan Chase is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.