Glossary
Central Limit Theorem (CLT)
A fundamental theorem stating that the distribution of sample means will be approximately normal if many random samples of a large enough size are taken from any population, regardless of the original population's distribution.
Example:
If you repeatedly take samples of 50 students from your school and calculate their average GPA, the distribution of those sample means will tend to be normal, even if individual student GPAs are skewed.
Independence (condition)
A condition for the Central Limit Theorem stating that the selection of one individual or observation in a sample does not influence the selection of any other. This is often met by sampling without replacement from a very large population (10% condition) or with replacement.
Example:
When drawing names from a hat for a raffle, replacing each name after it's drawn ensures independence between selections.
Inference
The process of drawing conclusions or making predictions about a population based on data from a sample. The Central Limit Theorem is crucial for making valid inferences about population means.
Example:
Based on a survey of 100 voters, concluding that a candidate will win the election is an act of statistical inference.
Population Distribution
The distribution of a variable for all individuals in an entire population. The Central Limit Theorem is powerful because it allows the sampling distribution of the mean to be approximately normal regardless of the shape of the original *population distribution* (if n is large enough).
Example:
The distribution of income for all households in a city represents the population distribution of income.
Population Mean (μ)
The true average value of a quantitative variable for an entire population. The Central Limit Theorem allows us to make inferences about this unknown parameter using sample means.
Example:
The average height of all adult males in a country is the population mean height.
Population Parameter
A numerical characteristic that describes an entire population (e.g., population mean, population standard deviation, population proportion). We use sample statistics to estimate these unknown values.
Example:
The true proportion of left-handed people in the world is a population parameter.
Quantitative Data (Means)
Numerical data that represents counts or measurements, for which it makes sense to calculate an average or mean. The Central Limit Theorem specifically applies to the distribution of sample means of this type of data.
Example:
The heights of students, the number of cars passing a point in an hour, or the temperature of a room are examples of quantitative data for which we might calculate means.
Random Sample (condition)
A sample where every individual or set of individuals in the population has an equal chance of being selected. This condition helps ensure the sample is representative and reduces bias.
Example:
Using a random number generator to pick 50 students from a school roster ensures a random sample.
Sample Means
The average value calculated from a single sample drawn from a population. The Central Limit Theorem describes the distribution of these averages when many samples are taken.
Example:
After surveying 30 randomly chosen students, the calculated average height of 65 inches is a sample mean.
Sample Size (n > 30)
The number of observations or individuals included in a sample. For the Central Limit Theorem, a *sample size* generally greater than 30 is considered sufficient for the sampling distribution of the mean to be approximately normal.
Example:
To apply the CLT, a study on the average weight of adult cats would need a sample size of at least 31 cats.
Sampling Distribution
The distribution of a statistic (like the sample mean or sample proportion) obtained by taking many samples of the same size from the same population. The Central Limit Theorem describes the shape of the *sampling distribution* of the mean.
Example:
If you repeatedly take samples of 30 students and calculate their average test score, the collection of all those average scores forms a sampling distribution of the mean test score.
Standard Deviation of Sample Means
The standard deviation of the sampling distribution of the sample mean, also known as the standard error of the mean. It measures the typical variability of sample means around the true population mean and decreases as sample size increases.
Example:
If the standard deviation of sample means for average commute times is 0.5 minutes, it means typical sample averages are about 0.5 minutes away from the true average commute time.