Glossary
Alternative hypothesis (Hₐ)
A statement that contradicts the null hypothesis, representing what the researcher is trying to find evidence for.
Example:
If you suspect a coin is biased towards heads, your alternative hypothesis (Hₐ) would be that the probability of heads is greater than 0.5.
Bias
A systematic distortion in a statistical result due to a flaw in the data collection process or analysis, leading to non-random variation.
Example:
A survey conducted only among customers who love a product might introduce bias by overestimating overall customer satisfaction.
Expected failures (n(1-p))
The anticipated number of unsuccessful outcomes in a sample, calculated by multiplying the sample size by the probability of failure ($1-p$).
Example:
If you sample 200 students and expect 40% to not participate in extracurriculars, the expected failures (n(1-p)) would be .
Expected successes (np)
The anticipated number of successful outcomes in a sample, calculated by multiplying the sample size by the probability of success.
Example:
If you sample 200 students and expect 60% to participate in extracurriculars, the expected successes (np) would be .
Large Counts Condition
A condition for using the normal distribution to approximate the sampling distribution of a sample proportion, requiring both expected successes ($np$) and expected failures ($n(1-p)$) to be at least 10.
Example:
Before using a normal model for a proportion, you must check the Large Counts Condition to ensure your sample size is adequate.
Measurement error
The difference between a measured value and the true value of a quantity, which can introduce non-random variation.
Example:
Using a ruler with a chipped end to measure lengths will lead to consistent measurement error.
Non-Random Variation
Indicates an underlying pattern or structure in the data, often caused by factors like measurement error, bias, or systematic differences.
Example:
If a scale consistently reads 2 pounds higher than the actual weight, this introduces non-random variation into weight measurements.
Normal Curve
A symmetrical, bell-shaped probability distribution that is fundamental to many statistical calculations and inference.
Example:
Many natural phenomena, like human heights or test scores, often approximate a normal curve.
Null hypothesis (H₀)
A statement of no effect or no difference, representing the status quo or a claim to be tested.
Example:
The null hypothesis (H₀) for a coin flip might be that the coin is fair, meaning the probability of heads is 0.5.
P-value
The probability of observing a test statistic as extreme as, or more extreme than, the one calculated from the sample data, assuming the null hypothesis is true.
Example:
A small p-value (e.g., 0.01) suggests that the observed data would be very unlikely if the null hypothesis were true, leading to its rejection.
Probability of success (p)
The true proportion of successes in a population, often a hypothesized value in hypothesis testing.
Example:
If 60% of high school students participate in extracurriculars, then the probability of success (p) for a randomly chosen student is 0.60.
Random Variation
Occurs when data values are scattered without a discernible pattern, often due to pure chance in random samples.
Example:
When flipping a fair coin 100 times, the slight differences in the number of heads or tails from 50 are due to random variation.
Sample proportion (p̂)
The proportion of successes observed in a specific sample, calculated as the number of successes divided by the sample size.
Example:
If 108 out of 200 sampled students participate in extracurriculars, the sample proportion (p̂) is 108/200 = 0.54.
Sample size (n)
The number of individuals or observations included in a sample from a larger population.
Example:
If you survey 50 students from a school, your sample size (n) is 50.
Sampling distributions
The distribution of a statistic (like a sample mean or proportion) obtained from all possible samples of a given size from a population.
Example:
If you repeatedly take samples of 30 students and calculate their average height, the distribution of those average heights would be a sampling distribution.
Significance level
A predetermined threshold (often 0.05 or 0.01) used in hypothesis testing to decide whether to reject the null hypothesis.
Example:
If your significance level is 0.05, you will reject the null hypothesis if your p-value is less than 0.05.
Statistical inference
The process of drawing conclusions about a population based on data from a sample, often relying on properties of distributions like the normal curve.
Example:
Using a sample of student grades to estimate the average GPA of all students in a school is an example of statistical inference.
Systematic differences
Consistent, non-random disparities in data that arise from an underlying structure or process, contributing to non-random variation.
Example:
If one production line consistently produces slightly larger parts than another, this indicates systematic differences in the manufacturing process.
Test statistic
A value calculated from sample data during a hypothesis test, used to measure how far the sample result deviates from what is expected under the null hypothesis.
Example:
In a z-test for proportions, the test statistic is a z-score that quantifies the difference between the sample proportion and the hypothesized population proportion.
Z-score
A standardized score that indicates how many standard deviations an element is from the mean of a distribution.
Example:
A student who scores 2 standard deviations above the mean on a test has a z-score of 2.0.
Z-score chart
A table used to find the probability (area under the normal curve) associated with a given z-score.
Example:
To find the percentage of data below a z-score of 1.5, you would consult a z-score chart.