Glossary
Central Limit Theorem (CLT)
A fundamental theorem stating that the sampling distribution of the sample mean (or sum) of a large number of independent, identically distributed random variables will be approximately normal, regardless of the original population distribution.
Example:
Even if the distribution of individual commute times is skewed, the Central Limit Theorem ensures that the distribution of average commute times from many large samples will be approximately normal.
Confidence Intervals
A range of values, calculated from sample data, that is likely to contain the true value of an unknown population parameter with a specified level of confidence.
Example:
A 90% confidence interval for the average height of adult males might be (68 inches, 70 inches), meaning we are 90% confident the true average height falls within this range.
Hypothesis Testing
A statistical method used to make decisions about a population parameter based on sample data, typically by comparing observed data to a null hypothesis.
Example:
A company might use hypothesis testing to determine if a new marketing campaign significantly increased customer engagement compared to the old one.
Large Counts Condition
A condition required for the sampling distribution of a sample proportion to be approximately normal. It states that both the expected number of successes ($np$) and failures ($n(1-p)$) in the sample must be at least 10.
Example:
Before using a normal model for a sample of 200 voters, you'd check the Large Counts Condition by ensuring that (200 * proportion supporting candidate) and (200 * proportion not supporting candidate) are both at least 10.
Nonresponse Bias
A type of bias that occurs when individuals selected for a sample do not participate, and their characteristics differ systematically from those who do respond, leading to an unrepresentative sample.
Example:
If a survey about healthy eating habits is only completed by people who are already health-conscious, the results will suffer from nonresponse bias and overestimate healthy eating in the population.
Population Proportion ($p$)
The true, unknown proportion of individuals in the entire population that possess a certain characteristic.
Example:
The actual percentage of all teenagers in the U.S. who use social media is the population proportion (), which we often try to estimate.
Sample Proportion ($\hat{p}$)
The fraction of successes observed in a specific sample, serving as the best estimate for the true population proportion.
Example:
If 35 out of 50 randomly selected students say they prefer chocolate ice cream, the sample proportion () is 0.70.
Sampling Distribution for Proportions
The distribution of all possible sample proportions ($\hat{p}$) that could be obtained from samples of the same size taken from a given population. Its mean is equal to the population proportion ($p$).
Example:
If you repeatedly took samples of 100 people and calculated the proportion who prefer coffee, the distribution of all those calculated proportions would form the sampling distribution for proportions.