Glossary
25th Percentile (Q1)
The value below which 25% of the data falls when ordered from least to greatest. It is also known as the first quartile.
Example:
If the 25th percentile for commute times is 15 minutes, it means 25% of commuters take 15 minutes or less to get to work.
Box Plot
A graphical display that summarizes the distribution of a quantitative variable using five key values: minimum, first quartile (Q1), median (Q2), third quartile (Q3), and maximum. It's excellent for comparing distributions and identifying outliers.
Example:
To quickly compare the spread and typical values of daily temperatures in two different cities over a month, you could use a box plot for each city.
Center (of a distribution)
Represents the typical or central value of a distribution, often measured by the mean or median.
Example:
The center of the distribution of commute times for employees at a company might be around 25 minutes, indicating a typical commute length.
Comparing Distributions
The process of analyzing and contrasting the characteristics (shape, center, spread, outliers) of two or more datasets, often using graphical displays.
Example:
A researcher might compare the distributions of test scores for students who used a new study method versus those who used a traditional method to see which is more effective.
Histogram
A graphical display that uses bars to show the frequency or relative frequency of data values within defined intervals (bins). It's useful for visualizing the shape of larger datasets.
Example:
A histogram might display the distribution of heights of all students in a high school, showing how many students fall into height ranges like 60-65 inches, 65-70 inches, etc.
Interquartile Range (IQR)
The range of the middle 50% of the data, calculated as the difference between the third quartile (Q3) and the first quartile (Q1). It is a resistant measure of spread.
Example:
If the interquartile range of house prices in a neighborhood is $50,000, it means the middle half of houses vary in price by that amount.
Mean
The arithmetic average of a dataset, calculated by summing all values and dividing by the number of values. It is sensitive to outliers and skewness.
Example:
If you add up all the points scored by a basketball team in a season and divide by the number of games, you get the mean points per game.
Median
The middle value in an ordered dataset, dividing the data into two equal halves. It is resistant to outliers and skewness.
Example:
In a list of student heights, if you arrange them from shortest to tallest, the height of the student exactly in the middle is the median height.
Outlier
A data point that lies an abnormal distance from other values in a random sample from a population. Outliers can significantly affect the mean and range.
Example:
If most students finish a 30-minute quiz in 15-20 minutes, but one student takes 45 minutes, that 45-minute time would be considered an outlier.
Range
The difference between the maximum and minimum values in a dataset, providing a simple measure of the overall spread.
Example:
If the highest temperature recorded in a week was 90°F and the lowest was 60°F, the range of temperatures for that week was 30°F.
SOCS
An acronym used to remember the four key aspects to discuss when describing or comparing distributions: Shape, Outliers, Center, and Spread.
Example:
When asked to describe the distribution of student test scores, a student should remember to address SOCS: its shape (e.g., symmetric), any outliers, its center (e.g., median score), and its spread (e.g., IQR).
Shape (of a distribution)
Describes the overall form of a distribution, including its symmetry, skewness, and number of peaks (modes).
Example:
When looking at a histogram of exam scores, you might describe its shape as 'skewed to the left' if most students scored high.
Skewed to the Left (Negatively Skewed)
A distribution where the tail extends further to the left, indicating that most data values are concentrated on the higher end, with a few lower values pulling the mean to the left of the median.
Example:
The distribution of scores on an easy exam might be skewed to the left, with most students scoring high and only a few scoring low.
Skewed to the Right (Positively Skewed)
A distribution where the tail extends further to the right, indicating that most data values are concentrated on the lower end, with a few higher values pulling the mean to the right of the median.
Example:
The distribution of household incomes is often skewed to the right, as most households earn moderate incomes, but a few very high incomes pull the average up.
Spread (of a distribution)
Describes the variability or dispersion of data values within a distribution, often measured by range, interquartile range (IQR), or standard deviation.
Example:
If the spread of test scores in one class is much wider than another, it means there's more variability in performance among students in that class.
Stem-and-Leaf Plot (Stem Plot)
A graphical display that shows the shape of the distribution while preserving the individual data values. Data is separated into a 'stem' (leading digit(s)) and a 'leaf' (trailing digit).
Example:
A stem-and-leaf plot could show the ages of participants in a survey, with stems representing tens (e.g., '2' for 20s) and leaves representing units (e.g., '2 | 3 5 8' for 23, 25, 28).
Symmetric (Distribution)
A distribution where the left and right sides are approximate mirror images of each other around the center. For symmetric distributions, the mean and median are approximately equal.
Example:
The distribution of weights of a specific brand of potato chips, if the manufacturing process is consistent, should be roughly symmetric around the target weight.
Unimodal
A distribution that has a single, distinct peak or mode.
Example:
A histogram showing the heights of adult women would likely be unimodal, with one central peak around the average height.