Glossary
10% Condition (Rule of Thumb)
A guideline stating that when sampling without replacement, the sample size should be no more than 10% of the population size to ensure approximate independence.
Example:
If we sample 50 students from a school, the 10% condition requires that the school has at least 500 students for our inferences to be valid.
Alternate Hypothesis (Hₐ)
A statement that contradicts the null hypothesis, representing what the researcher is trying to find evidence for.
Example:
If the alternate hypothesis (Hₐ) states that the new fertilizer increases crop yield, we are looking for evidence of a positive difference.
Categorical Data
Data that represents qualities or characteristics, often grouped into categories, and cannot be meaningfully measured numerically.
Example:
A survey asking students their favorite AP subject collects categorical data.
Central Limit Theorem (CLT)
A fundamental theorem stating that the sampling distribution of the sample mean will be approximately normal, regardless of the population distribution, as the sample size increases (typically n ≥ 30).
Example:
Thanks to the Central Limit Theorem (CLT), even if individual customer waiting times are highly variable, the average waiting time from many samples will follow a normal distribution.
Confidence Level
The probability that a confidence interval will contain the true population parameter, often expressed as a percentage.
Example:
A 95% confidence level means that if we repeated the sampling process many times, 95% of the constructed intervals would contain the true population mean.
Independence (Condition)
The condition that observations in the sample are independent of each other, meaning the outcome of one observation does not influence another.
Example:
When sampling items from a production line, the independence condition is met if selecting one item doesn't affect the quality of the next.
Left-tailed test
A hypothesis test where the alternative hypothesis states that the population parameter is 'less than' the hypothesized value.
Example:
If a consumer group suspects a company is underfilling packages, they would perform a left-tailed test (Hₐ: μ < stated weight).
Modified Boxplot
A type of boxplot that specifically identifies and displays outliers as individual points, rather than incorporating them into the whiskers.
Example:
To check for extreme outliers that might violate the normality assumption for small sample sizes, a statistician might create a modified boxplot of the data.
Normal (Condition)
The condition that the sampling distribution of the sample mean must be approximately normal to use t-procedures.
Example:
Even if the population data is skewed, the normal condition can often be met for the sampling distribution if the sample size is large enough.
Null Hypothesis (H₀)
A statement of no effect, no difference, or no relationship, which is assumed to be true until evidence suggests otherwise.
Example:
The null hypothesis (H₀) for a new fertilizer might state that the average crop yield is the same as with the old fertilizer.
One-Sample t-Test
A statistical hypothesis test used to compare a sample mean to a known or hypothesized population mean when the population standard deviation is unknown.
Example:
A researcher uses a one-sample t-test to see if the average weight of apples from a new orchard differs from the historical average of 150 grams.
Population Mean
The true average value of a variable for an entire group of individuals or objects of interest.
Example:
We often try to estimate the population mean height of all high school seniors based on a sample.
Population Standard Deviation (σ)
A measure of the spread or dispersion of values in an entire population.
Example:
If we knew the population standard deviation (σ) of all test scores, we could use a z-test instead of a t-test.
Quantitative Data
Numerical data that represents counts or measurements, allowing for mathematical operations.
Example:
The number of hours a student spends studying for the AP Stats exam is an example of quantitative data.
Random (Condition)
The condition that the sample must be selected randomly from the population to ensure it is representative and avoid bias.
Example:
To generalize results about student preferences, it's crucial that the sample meets the random condition by using a simple random sample.
Rejection Region
The set of values for the test statistic that would lead to rejecting the null hypothesis at a given significance level.
Example:
If our calculated t-statistic falls within the rejection region, we have enough evidence to reject the null hypothesis.
Right-tailed test
A hypothesis test where the alternative hypothesis states that the population parameter is 'greater than' the hypothesized value.
Example:
A pharmaceutical company testing a new drug for pain relief would conduct a right-tailed test if they expect the drug to increase the average pain relief score (Hₐ: μ > baseline score).
Sample Mean
The average value of a variable calculated from a subset of a population.
Example:
After surveying 50 students, the sample mean GPA was found to be 3.7.
Significance Level (α)
The probability of rejecting the null hypothesis when it is actually true, representing the maximum risk of making a Type I error that one is willing to accept.
Example:
Setting the significance level (α) at 0.05 means there's a 5% chance of incorrectly concluding a new teaching method is effective when it's not.
Significance Test
A formal procedure used to evaluate the evidence provided by data against a null hypothesis and in favor of an alternative hypothesis.
Example:
Before launching a new drug, a pharmaceutical company performs a significance test to determine if the drug's effect is statistically different from a placebo.
Two-tailed test
A hypothesis test where the alternative hypothesis states that the population parameter is simply 'not equal to' the hypothesized value, allowing for differences in either direction.
Example:
A two-tailed test would be used if we want to know if the average battery life of a new phone model is simply different from the advertised 24 hours (Hₐ: μ ≠ 24).
Type I Error
The error of rejecting a true null hypothesis, also known as a 'false positive'.
Example:
A Type I error would occur if a medical test incorrectly indicates a patient has a disease when they are actually healthy.
Type II Error
The error of failing to reject a false null hypothesis, also known as a 'false negative'.
Example:
A Type II error would occur if a medical test incorrectly indicates a patient is healthy when they actually have a disease.
t-scores
Standardized scores used in t-distributions, calculated when the population standard deviation is unknown and estimated from the sample.
Example:
When analyzing the average commute time, since the population standard deviation is unknown, we calculate t-scores to perform our hypothesis test.
z-test
A statistical hypothesis test used to compare a sample mean to a known population mean when the population standard deviation is known.
Example:
If a problem explicitly states the z-test should be used, it implies the population standard deviation is provided.