Glossary
Box Plot
A graphical display of the five-number summary, showing the distribution, center, spread, and potential outliers of a dataset.
Example:
A box plot quickly revealed that while most students finished the race between 15 and 20 minutes, there was one very fast runner.
Five Number Summary
A concise summary of a dataset's distribution, consisting of the minimum, first quartile (Q1), median, third quartile (Q3), and maximum values.
Example:
To quickly understand the spread of daily temperatures, a meteorologist might look at the five number summary (Min: 50°F, Q1: 60°F, Med: 70°F, Q3: 75°F, Max: 85°F).
Interquartile Range (IQR)
A measure of spread representing the range of the middle 50% of the data, calculated as Q3 - Q1.
Example:
An IQR of 15 for the heights of a group of teenagers means the middle 50% of their heights span a 15-inch range.
Lower fence
The lower boundary used to identify potential low outliers, calculated as Q1 - 1.5 * IQR.
Example:
A data point below the lower fence of 20 would be flagged as an unusually low value.
Maximum
The largest value in a dataset.
Example:
The maximum number of points scored by a basketball team in a game was 125.
Median
The middle value of an ordered dataset, which divides the data into two equal halves (50% below, 50% above).
Example:
The median commute time for employees was 30 minutes, indicating half of the employees commute less than 30 minutes and half commute more.
Minimum
The smallest value in a dataset.
Example:
The minimum number of hours a student studied for the AP Stats exam was 2, which is quite low!
Outliers
Data points that are unusually far from the rest of the data, identified using specific rules like the 1.5 * IQR rule.
Example:
A student who scored a perfect 100 on a notoriously difficult exam, while most others scored in the 60s, might be considered an outlier.
Q1 (First Quartile)
The value below which 25% of the data falls, also known as the 25th percentile.
Example:
If the Q1 for test scores was 70, it means 25% of students scored 70 or below.
Q3 (Third Quartile)
The value below which 75% of the data falls, also known as the 75th percentile.
Example:
With a Q3 of 92 for quiz scores, 75% of the class scored 92 or lower.
Skewed (distribution)
A data distribution where the median is closer to one end of the box and one whisker is noticeably longer, indicating a tail in that direction.
Example:
Income data is typically skewed to the right, meaning there are a few very high earners pulling the tail in that direction.
Symmetric (distribution)
A data distribution where the median is roughly in the middle of the box plot and the whiskers are approximately equal in length, indicating balanced data.
Example:
A dataset of adult shoe sizes often shows a symmetric distribution, with roughly equal numbers of sizes above and below the average.
Upper fence
The upper boundary used to identify potential high outliers, calculated as Q3 + 1.5 * IQR.
Example:
Any data point above the upper fence of 95 would be considered an unusually high score.