Glossary
1.5 x IQR Rule
A common method to identify outliers, where values below Q1 - 1.5*IQR or above Q3 + 1.5*IQR are considered outliers.
Example:
Using the 1.5 x IQR rule, a data point of 100 in a dataset with Q3=40 and IQR=20 would be identified as an outlier because 100 > (40 + 1.5*20 = 70).
Degrees of Freedom
The number of independent values or pieces of information that are free to vary in a calculation, often n-1 for sample standard deviation.
Example:
When calculating the sample standard deviation, we use n-1 for the degrees of freedom to provide a better estimate of the population variability.
Interquartile Range (IQR)
The range of the middle 50% of the data, calculated as the difference between the third quartile (Q3) and the first quartile (Q1).
Example:
The IQR of student study times was 10 hours, meaning the middle half of students varied by 10 hours in their study duration.
Left-Skewed
A distribution where the tail extends to the left, meaning the mean is typically less than the median.
Example:
The distribution of scores on an easy exam might be left-skewed, with most students scoring high and only a few scoring very low.
Mean (x̄)
The arithmetic average of a dataset, calculated by summing all values and dividing by the number of values.
Example:
The mean score on a recent quiz was 85, indicating the average performance of the class.
Median
The middle value of a dataset when the data is ordered from least to greatest.
Example:
In a dataset of house prices, the median price is often used because it's not skewed by a few extremely expensive mansions.
Non-resistant
A statistic that is significantly affected by extreme values or outliers in a dataset.
Example:
The non-resistant nature of the mean means that one extremely high salary in a company can drastically inflate the reported average salary.
Nonresistant Measures
Summary statistics that are sensitive to extreme values or outliers, such as the mean, standard deviation, and range.
Example:
Because the mean is a nonresistant measure, a single very large data entry error could significantly distort the average of a dataset.
Outliers
Data points that are unusually far from the rest of the data in a dataset.
Example:
A student scoring a 10 on a test where everyone else scored above 70 would be considered an outlier.
Parameters
Numerical summaries that describe an entire population.
Example:
The true average GPA of all college students in the U.S. is a parameter.
Populations
The entire group of individuals or instances about which we want to gather information.
Example:
If you're studying the effectiveness of a new drug, the population would be all individuals who could potentially take that drug.
Q1 (First Quartile)
The median of the lower half of an ordered dataset, representing the 25th percentile.
Example:
If the Q1 for commute times is 15 minutes, 25% of commuters take 15 minutes or less to get to work.
Q3 (Third Quartile)
The median of the upper half of an ordered dataset, representing the 75th percentile.
Example:
A Q3 of 45 minutes for workout durations means that 75% of people work out for 45 minutes or less.
Resistant
A statistic that is not significantly affected by extreme values or outliers in a dataset.
Example:
Because the median is resistant to outliers, it's a better measure of typical home prices in a neighborhood where one mansion might distort the average.
Resistant Measures
Summary statistics that are not heavily influenced by extreme values or outliers, such as the median and IQR.
Example:
When analyzing highly skewed data like wealth distribution, resistant measures like the median and IQR provide a more accurate picture of typical values.
Right-Skewed
A distribution where the tail extends to the right, meaning the mean is typically greater than the median.
Example:
The distribution of the number of children per family is often right-skewed, as most families have 1-3 children, but a few have many more.
Samples
A subset of individuals selected from a larger population for study.
Example:
To estimate the average height of all high school students, you might measure the heights of a sample of 200 students.
Skewed Distributions
A distribution that is not symmetric, having a longer 'tail' on one side.
Example:
Income data often shows a skewed distribution to the right, with most people earning lower incomes and a few earning very high incomes.
Standard Deviation (s)
A measure of the typical distance or spread of data points from the mean.
Example:
A small standard deviation for test scores indicates that most students scored very close to the class average.
Statistics
Numerical summaries calculated from sample data.
Example:
If you survey 50 students about their favorite ice cream flavor, the percentage who prefer chocolate is a statistic.
Symmetric Distributions
A distribution where the left and right sides are approximate mirror images of each other.
Example:
A dataset of adult heights often forms a symmetric distribution, with most people around the average height and fewer at very short or very tall extremes.