Glossary
Central Limit Theorem (CLT)
A fundamental theorem stating that if sample sizes are sufficiently large (typically n ≥ 30 for both samples), the sampling distribution of the sample mean (or difference in means) will be approximately normal, regardless of the shape of the original population distribution.
Example:
Even if the distribution of individual incomes in two cities is highly skewed, if you take large enough samples (e.g., n=50 from each), the Central Limit Theorem ensures that the distribution of the differences in sample mean incomes will be approximately normal, allowing for normal-based inference.
Mean of the sampling distribution
The expected value of the difference between two sample means, which is equal to the true difference between the two population means (μ1 - μ2).
Example:
If the true average commute time for city A is 30 minutes and city B is 20 minutes, the mean of the sampling distribution for the difference (A-B) would be 10 minutes, representing the average difference you'd expect to see across many samples.
Sampling Distributions of the Difference Between Two Means
The distribution of all possible differences between sample means (x̄1 - x̄2) that would be obtained if many pairs of samples were repeatedly drawn from two populations.
Example:
If you repeatedly take samples of student test scores from two different teaching methods and calculate the difference in their average scores, the distribution of all those differences forms the sampling distribution of the difference between two means.
Self-selection bias
A type of bias that occurs when individuals choose to participate in a study, leading to a sample that is not representative of the population because those who choose to participate may differ systematically from those who do not.
Example:
If a survey about student satisfaction is only completed by students who actively choose to click a link in an email, the results might suffer from self-selection bias because highly satisfied or highly dissatisfied students might be more inclined to respond, skewing the overall findings.
Standard deviation of the sampling distribution of the difference between two means
A measure of the variability or spread of the sampling distribution of the difference between two sample means, calculated by combining the variances of the individual sample means.
Example:
When comparing the average heights of male and female students, the standard deviation of the sampling distribution of the difference between two means tells you how much the observed differences in average heights are expected to vary from sample to sample.