zuai-logo

Graphical Representations of Summary Statistics

Ava Garcia

Ava Garcia

8 min read

Listen to this study note

Study Guide Overview

This study guide covers five-number summaries (minimum, Q1, median, Q3, maximum) and their visualization using box plots. It explains how to construct box plots, detect outliers using the IQR, and interpret skew from box plots. The guide also includes practice questions and answers on these concepts.

Visualizing Data: Box Plots and Five-Number Summaries

Hey there, future AP Stats superstar! πŸ‘‹ Let's dive into how we can use five-number summaries and box plots to really understand our data. This is a crucial topic, so let's make sure it sticks!

This section is super important because it combines measures of center and spread with graphical representations. Expect to see this on the exam!

Five Number Summaries

A five number summary is like a quick cheat sheet for your data. It gives you the minimum, Q1 (first quartile), median, Q3 (third quartile), and maximum values. Think of it as the ultimate data snapshot. πŸ“Έ

Remember, quartiles split your data into four equal parts. Q1 marks the 25th percentile, the median (Q2) is the 50th percentile, and Q3 is the 75th percentile.

Quick Fact

Quartiles divide the data into four equal sections. Each quartile contains 25% of the data.

For example, let's look at this dataset: 5, 7, 8, 9, 10, 12, 15, 20, 25, 30

  • Minimum: 5
  • Q1: 8
  • Median: 12
  • Q3: 20
  • Maximum: 30

Easy peasy, right? πŸŽ‰ This summary gives you a quick look at the data's range, spread, and center.

Memory Aid

To remember the order: Min-Q1-Med-Q3-Max - Think of it like a data lineup! Starts small, gets bigger, then ends big.

Box Plots

A box plot (or box-and-whisker plot) is a visual way to show the five-number summary. It helps you quickly see the distribution of your data and spot any outliers. πŸ•΅οΈ

Here's how to build one:

  1. Draw a horizontal line (the axis).
  2. Mark the minimum, Q1, median, Q3, and maximum values.
  3. Draw a box from Q1 to Q3. The median goes inside the box.
  4. Draw "whiskers" from the box to the min and max values (unless there are outliers).
  5. Outliers are plotted as individual points beyond the whiskers. 🐭
Exam Tip

Box plots are great for comparing distributions side-by-side. Always label your axes and include units!

Box Plot Example

Source: Simply Psychology

Detecting Outliers

We use the Interquartile Range (IQR) to find outliers. Remember: IQR = Q3 - Q1

We calculate fences to detect outliers:

  • Upper fence = Q3 + 1.5 * IQR
  • Lower fence = Q1 - 1.5 * IQR

Any data points beyond these fences are considered outliers and are plotted separately on the box plot.

Key Concept

Outliers are data points that are unusually far from the rest of the data. They can skew your analysis, so it's important to identify them.

Box Plot with Outliers

Source: EzBioCloud

Box Plots and Skew

Box plots can also show you if your data is skewed or symmetric. πŸ“

  • Symmetric: The median is in the middle of the box, and the whiskers are roughly equal in length.
  • Skewed: The median is closer to one end of the box, and one whisker is longer than the other. The skew is in the direction of the longer whisker.
Common Mistake

Don't confuse the direction of the skew with the location of the median. The skew is towards the longer whisker, not towards the median.

Box Plot Skew Examples

Source: Statology

Key Vocabulary

  • Minimum
  • Quartile 1 (or First Quartile)
  • Median
  • Quartile 3 (or Third Quartile)
  • Maximum
  • Boxplots
  • Fences

Practice Questions

Practice Question

(1) Which of the following is NOT a part of a five number summary?

A) Minimum value

B) First quartile

C) Median

D) Range

E) Third quartile

(2) Consider the following dataset of exam scores for a class of 30 students:

75, 80, 85, 85, 90, 90, 90, 95, 95, 95, 95, 95, 95, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100

A. Create a five number summary for the dataset.

B. Create a box plot for the dataset.

C. What can you conclude about the distribution of the exam scores based on the five number summary and the box plot?

(3) A researcher is studying the heights of a sample of 100 adults. The five number summary for the sample is:

Minimum value: 150 cm

First quartile: 160 cm

Median: 170 cm

Third quartile: 180 cm

Maximum value: 200 cm

Is a data point with a height of 220 cm considered an outlier according to the 1.5 x IQR rule?

Answers

(1) D) Range. A five number summary consists of the minimum value, the first quartile, the median, the third quartile, and the maximum value of a dataset. The range, which is the difference between the minimum and maximum values, is not a part of the five number summary.

(2) A. To create a five number summary for the dataset, you need to calculate the minimum value, the first quartile, the median, the third quartile, and the maximum value.

The minimum value is 75, the maximum value is 100, and the median is 95. To find the first quartile (Q1), you need to find the median of the lower half of the dataset. The lower half of the dataset consists of the first 15 scores, which are:

75, 80, 85, 85, 90, 90, 90, 95, 95, 95, 95, 95, 95, 100, 100

The median of the lower half of the dataset is 90. To find the third quartile (Q3), you need to find the median of the upper half of the dataset. The upper half of the dataset consists of the last 15 scores, which are:

95, 95, 95, 95, 95, 95, 100, 100, 100, 100, 100, 100, 100, 100, 100

The median of the upper half of the dataset is 100. Therefore, the five number summary for the dataset is:

Minimum value: 75

First quartile: 90

Median: 95

Third quartile: 100

Maximum value: 100

B. I'll leave it up to you to draw the box plot and get some practice. πŸ˜‰

C. Based on the five number summary and the box plot, you can conclude that the distribution of the exam scores is skewed to the left, with a long tail of low scores. The median (95) is closer to the right side of the box, with more data on the left side. This indicates that there are more high scores in the dataset than low scores. The minimum value (75) and the first quartile (90) are also relatively low, indicating that there are a few low scores in the dataset. Overall, the distribution of the exam scores is skewed to the left, with a long tail of low scores and a few low scores.

(3) To answer this question, you need to calculate the interquartile range (IQR) of the sample. The IQR is the difference between the third quartile and the first quartile, and is a measure of the spread of the data. In this case, the IQR is 180 cm - 160 cm = 20 cm.

According to the 1.5 x IQR rule, a data point is considered an outlier if it is more than 1.5 times the IQR below the first quartile or more than 1.5 times the IQR above the third quartile.

In this case, the data point with a height of 220 cm is more than 1.5 times the IQR above the third quartile (180 cm), so it is considered an outlier.

The correct answer is: Yes, a height of 220 cm is considered an outlier in our data set! (Notice how we didn't need the raw dataset or the box plot to determine whether a data point is an outlier or not? The five number summary definitely suffices in providing enough information and context.)

Question 1 of 9

Which of the following is NOT a component of a five-number summary? πŸ€”

Minimum value

First quartile (Q1)

Median

Interquartile Range (IQR)