Glossary
10% Condition
A specific check for the independence condition when sampling without replacement, stating that the population size must be at least 10 times larger than the sample size (N ≥ 10n).
Example:
If you sample 50 fish from a lake to estimate their average length, the 10% condition requires that there are at least 500 fish in the lake.
Central Limit Theorem (CLT)
A fundamental theorem stating that for a sufficiently large sample size (typically n ≥ 30), the sampling distribution of the sample mean will be approximately normal, regardless of the shape of the population distribution.
Example:
Thanks to the Central Limit Theorem, even if the distribution of individual incomes in a city is highly skewed, the distribution of sample mean incomes from large samples will be roughly normal.
Confidence Interval
A range of plausible values for an unknown population parameter, constructed from sample data, with a specified level of confidence that the interval contains the true parameter.
Example:
A 95% confidence interval for the average height of adult males might be (68 inches, 70 inches), suggesting we are 95% confident the true average height falls within this range.
Confidence Level
The probability that a randomly constructed confidence interval will contain the true population parameter. Common levels are 90%, 95%, and 99%.
Example:
If you construct many 95% confidence intervals, you'd expect about 95% of them to capture the true population mean; this is the meaning of the confidence level.
Context (in interpretation)
The specific real-world scenario or variable being studied, which must be included in the interpretation of a confidence interval to make it meaningful.
Example:
When interpreting a confidence interval for the average weight of chocolate chip cookies, stating 'the true average weight of the bakery's chocolate chip cookies' provides essential context.
Critical Value (t*)
A multiplier used in the margin of error calculation, determined by the desired confidence level and the degrees of freedom, which defines the boundary of the confidence interval.
Example:
For a 95% confidence interval with 24 degrees of freedom, you would look up the appropriate critical value (t)* from a t-table or use a calculator's invT function.
Degrees of freedom (df)
A parameter that defines the shape of the t-distribution, typically calculated as the sample size minus one (n-1). As degrees of freedom increase, the t-distribution approaches the normal distribution.
Example:
If you have a sample of 20 students, your degrees of freedom would be 19, which influences the specific t-critical value you'd use.
Heavier tails
A characteristic of the t-distribution, meaning there is a greater probability of observing extreme values compared to the normal distribution. This reflects the added uncertainty when estimating population variance from a sample.
Example:
Because of its heavier tails, a t-distribution is more conservative than a normal distribution, giving wider confidence intervals for small samples to account for potential outliers.
Independence (Condition)
A condition for inference requiring that each observation in the sample is independent of the others. For sampling without replacement, this is checked by the 10% condition.
Example:
When surveying students about their favorite subject, the independence condition means one student's answer shouldn't influence another's, and if sampling without replacement, the 10% condition should be met.
Margin of Error
The range of values above and below the point estimate in a confidence interval, accounting for the uncertainty in the estimate due to sampling variability.
Example:
A survey reports that 60% of voters support a candidate with a margin of error of ±3%, meaning the true support is likely between 57% and 63%.
Normal (Condition)
A condition for inference requiring that the sampling distribution of the sample mean is approximately normal. This can be met if the population is normal or if the sample size is large enough (n ≥ 30) by the Central Limit Theorem.
Example:
Even if the distribution of individual test scores is skewed, if your sample size is 40, the Normal condition is met for the sampling distribution of the mean score due to the CLT.
One-sample t-interval
A statistical procedure used to construct a confidence interval for an unknown population mean of a quantitative variable, based on data from a single sample when the population standard deviation is unknown.
Example:
To estimate the average amount of sleep high school seniors get per night, a researcher would construct a one-sample t-interval using data from a random sample of seniors.
Point Estimate
A single value calculated from sample data that serves as the best guess or approximation for an unknown population parameter.
Example:
If a sample of 100 apples has an average weight of 150 grams, then 150 grams is the point estimate for the true average weight of all apples in the orchard.
Population Mean
The true average value of a quantitative variable for an entire population, which is the parameter that a confidence interval for means aims to estimate.
Example:
A confidence interval helps us estimate the population mean of all test scores, not just the average of the scores in our sample.
Random Sample (Condition)
A crucial condition for inference stating that the data must come from a randomly selected sample to ensure it is representative of the population and avoid bias.
Example:
Before calculating a confidence interval for the average height of trees in a forest, you must ensure you have a random sample of trees, perhaps by using a random number generator to select coordinates.
Standard Error
The estimated standard deviation of a sampling distribution, calculated using sample statistics (like sample standard deviation) instead of unknown population parameters.
Example:
When calculating a confidence interval for a mean, the standard error is the sample standard deviation divided by the square root of the sample size, indicating the typical variability of sample means.
t-distribution
A probability distribution used for estimating population means when the sample size is small and the population standard deviation is unknown. It accounts for the increased uncertainty compared to the normal distribution.
Example:
When analyzing the average commute time for a small group of 15 employees, you'd use a t-distribution because you don't know the true population standard deviation of commute times.