Glossary
Central Limit Theorem (CLT)
A fundamental theorem stating that the distribution of sample means will be approximately normal, regardless of the population's distribution, as long as the sample size is sufficiently large (typically n ≥ 30).
Example:
Even if the distribution of individual incomes in a city is skewed, the Central Limit Theorem tells us that the distribution of sample mean incomes from many large samples will be approximately normal.
Confidence interval
A range of plausible values for an unknown population parameter, constructed from sample data, along with a confidence level indicating the proportion of such intervals that would capture the true parameter if the process were repeated many times.
Example:
A 95% confidence interval for the average weight of a certain type of apple might be (150g, 160g), meaning we are 95% confident the true average weight falls within this range.
Critical value (t*)
A multiplier from the t-distribution (or z-distribution) that determines the width of the confidence interval, based on the desired confidence level and degrees of freedom. It marks the boundary of the central portion of the distribution.
Example:
For a 95% confidence interval with 30 degrees of freedom, the critical value (t)* might be around 2.042, which helps define the margin of error.
Degrees of freedom (df)
A parameter that specifies the shape of a t-distribution, calculated as n-1 for a single sample mean. It represents the number of independent pieces of information available to estimate a parameter.
Example:
If you have a sample size of 25, your degrees of freedom for a t-distribution would be 24.
Margin of error
The maximum likely difference between a sample statistic and the true population parameter, calculated as the critical value multiplied by the standard error. It quantifies the precision of an estimate.
Example:
If a survey reports a 5% margin of error for a political candidate's approval rating, it means the true approval rating is likely within 5 percentage points of the reported sample percentage.
Point estimate
A single value calculated from sample data that is used to estimate an unknown population parameter. For a population mean, the sample mean (x̄) serves as the point estimate.
Example:
If a sample of 50 students has an average height of 68 inches, then 68 inches is the point estimate for the true average height of all students.
Population mean
The true average value of a variable for an entire population, which is typically unknown and estimated using sample data. It is a key measure of central tendency.
Example:
The population mean height of all adult males in a country is a value we often try to estimate using a sample.
Random, independent sample
A sample where each member of the population has an equal chance of being selected, and the selection of one individual does not influence the selection of another. This ensures the sample is representative and allows for valid statistical inference.
Example:
To estimate the average GPA of students at a large university, a researcher might select a random, independent sample of 100 students by drawing names from a hat.
Standard error (SE)
The standard deviation of the sampling distribution of a statistic, indicating how much sample means are expected to vary from the true population mean. For a sample mean, it's calculated as s/√n.
Example:
If the sample standard deviation of student heights is 3 inches and the sample size is 100, the standard error of the mean height would be 0.3 inches, indicating the typical variability of sample means.
Statistical claim for the population mean
A statement about the average value of a particular population, which is then tested using sample data.
Example:
A cereal company might make a statistical claim for the population mean that their boxes contain an average of 500 grams of cereal.
t-distributions
A family of probability distributions used when estimating a population mean from a sample, especially when the population standard deviation is unknown and estimated from the sample. They are bell-shaped and symmetric, similar to the normal distribution but with heavier tails.
Example:
When constructing a confidence interval for the average test score of a class using a small sample, we would use t-distributions because the population standard deviation is unknown.
t-scores
Values from a t-distribution that are used to calculate confidence intervals or perform hypothesis tests for population means when the population standard deviation is unknown. They are similar to z-scores but account for the additional uncertainty from estimating the standard deviation.
Example:
To find the critical value for a 95% confidence interval for a mean with 29 degrees of freedom, you would look up the appropriate t-score in a t-table.