Glossary

1.5 x IQR Rule

Criticality: 3

A common method to identify outliers, where values below Q1 - 1.5*IQR or above Q3 + 1.5*IQR are considered outliers.

Example:

Using the 1.5 x IQR rule, a data point of 100 in a dataset with Q3=40 and IQR=20 would be identified as an outlier because 100 > (40 + 1.5*20 = 70).

Degrees of Freedom

Criticality: 2

The number of independent values or pieces of information that are free to vary in a calculation, often n-1 for sample standard deviation.

Example:

When calculating the sample standard deviation, we use n-1 for the degrees of freedom to provide a better estimate of the population variability.

Interquartile Range (IQR)

Criticality: 3

The range of the middle 50% of the data, calculated as the difference between the third quartile (Q3) and the first quartile (Q1).

Example:

The IQR of student study times was 10 hours, meaning the middle half of students varied by 10 hours in their study duration.

Left-Skewed

Criticality: 3

A distribution where the tail extends to the left, meaning the mean is typically less than the median.

Example:

The distribution of scores on an easy exam might be left-skewed, with most students scoring high and only a few scoring very low.

Mean (x̄)

Criticality: 3

The arithmetic average of a dataset, calculated by summing all values and dividing by the number of values.

Example:

The mean score on a recent quiz was 85, indicating the average performance of the class.

Median

Criticality: 3

The middle value of a dataset when the data is ordered from least to greatest.

Example:

In a dataset of house prices, the median price is often used because it's not skewed by a few extremely expensive mansions.

Non-resistant

Criticality: 3

A statistic that is significantly affected by extreme values or outliers in a dataset.

Example:

The non-resistant nature of the mean means that one extremely high salary in a company can drastically inflate the reported average salary.

Nonresistant Measures

Criticality: 3

Summary statistics that are sensitive to extreme values or outliers, such as the mean, standard deviation, and range.

Example:

Because the mean is a nonresistant measure, a single very large data entry error could significantly distort the average of a dataset.

Outliers

Criticality: 3

Data points that are unusually far from the rest of the data in a dataset.

Example:

A student scoring a 10 on a test where everyone else scored above 70 would be considered an outlier.

Parameters

Criticality: 2

Numerical summaries that describe an entire population.

Example:

The true average GPA of all college students in the U.S. is a parameter.

Populations

Criticality: 2

The entire group of individuals or instances about which we want to gather information.

Example:

If you're studying the effectiveness of a new drug, the population would be all individuals who could potentially take that drug.

Q1 (First Quartile)

Criticality: 3

The median of the lower half of an ordered dataset, representing the 25th percentile.

Example:

If the Q1 for commute times is 15 minutes, 25% of commuters take 15 minutes or less to get to work.

Q3 (Third Quartile)

Criticality: 3

The median of the upper half of an ordered dataset, representing the 75th percentile.

Example:

A Q3 of 45 minutes for workout durations means that 75% of people work out for 45 minutes or less.

Resistant

Criticality: 3

A statistic that is not significantly affected by extreme values or outliers in a dataset.

Example:

Because the median is resistant to outliers, it's a better measure of typical home prices in a neighborhood where one mansion might distort the average.

Resistant Measures

Criticality: 3

Summary statistics that are not heavily influenced by extreme values or outliers, such as the median and IQR.

Example:

When analyzing highly skewed data like wealth distribution, resistant measures like the median and IQR provide a more accurate picture of typical values.

Right-Skewed

Criticality: 3

A distribution where the tail extends to the right, meaning the mean is typically greater than the median.

Example:

The distribution of the number of children per family is often right-skewed, as most families have 1-3 children, but a few have many more.

Samples

Criticality: 2

A subset of individuals selected from a larger population for study.

Example:

To estimate the average height of all high school students, you might measure the heights of a sample of 200 students.

Skewed Distributions

Criticality: 3

A distribution that is not symmetric, having a longer 'tail' on one side.

Example:

Income data often shows a skewed distribution to the right, with most people earning lower incomes and a few earning very high incomes.

Standard Deviation (s)

Criticality: 3

A measure of the typical distance or spread of data points from the mean.

Example:

A small standard deviation for test scores indicates that most students scored very close to the class average.

Statistics

Criticality: 2

Numerical summaries calculated from sample data.

Example:

If you survey 50 students about their favorite ice cream flavor, the percentage who prefer chocolate is a statistic.

Symmetric Distributions

Criticality: 2

A distribution where the left and right sides are approximate mirror images of each other.

Example:

A dataset of adult heights often forms a symmetric distribution, with most people around the average height and fewer at very short or very tall extremes.