Glossary
Chi-square goodness-of-fit test
A statistical test used to determine if an observed frequency distribution matches an expected frequency distribution.
Example:
You would use a chi-square goodness-of-fit test to see if the distribution of M&M colors in a bag matches the manufacturer's stated proportions.
Chi-square test for independence
A statistical test used to determine if there is a significant association between two categorical variables in a population.
Example:
To see if there's a relationship between a person's preferred type of music and their political affiliation, you would use a chi-square test for independence.
Chi-square test statistic (χ²)
A calculated value that quantifies the discrepancy between observed and expected frequencies in a chi-square test.
Example:
A large chi-square test statistic suggests a significant difference between observed and expected counts, making the null hypothesis less plausible.
Degrees of Freedom (df)
The number of independent values or pieces of information that are free to vary in a statistical calculation.
Example:
For a chi-square goodness-of-fit test with 4 categories, the degrees of freedom would be 3.
Expected Counts
The frequencies or numbers of occurrences that would be anticipated in each category if the null hypothesis were true.
Example:
If a company claims 30% of candies are red in a bag of 200, the expected count for red candies is 60.
Law of Large Numbers
A theorem stating that as the sample size increases, the sample mean will converge to the true population mean.
Example:
The more times you flip a fair coin, the closer the proportion of heads will get to 0.5, illustrating the Law of Large Numbers.
Null Hypothesis (H0)
A statement of no effect, no difference, or no relationship, which is assumed to be true until evidence suggests otherwise.
Example:
For a new drug, the null hypothesis might state that the drug has no effect on blood pressure.
Observed Counts
The actual frequencies or numbers of occurrences recorded in each category from a collected sample or experiment.
Example:
In a survey of 100 people, if 30 prefer coffee, then 30 is the observed count for coffee preference.
P-value
The probability of observing results as extreme as, or more extreme than, the observed results, assuming the null hypothesis is true.
Example:
A p-value of 0.03 means there's a 3% chance of seeing data like ours if the null hypothesis were true.
Power
The probability of correctly rejecting a false null hypothesis; it's the ability of a test to detect a true effect if one exists.
Example:
A study with high power is more likely to find a significant difference if one truly exists between two treatments.
Practical Significance
Refers to whether a statistically significant result is large enough or meaningful enough to be important in a real-world context.
Example:
A new teaching method might show a statistically significant improvement of 0.1 points on a 100-point test, but this might lack practical significance.
Random Variation
Differences in observed data that occur purely by chance, without any underlying systematic cause or pattern.
Example:
Flipping a fair coin 10 times and getting 6 heads instead of 5 is likely due to random variation.
Sample Size
The number of observations or individuals included in a statistical sample.
Example:
If a survey interviews 500 people, the sample size is 500.
Standard Deviation
A measure of the typical distance or spread of data points from the mean of a distribution.
Example:
A low standard deviation for test scores means most students scored close to the average.
Statistical Significance
A result is statistically significant if the p-value is less than the chosen significance level (alpha), indicating the observed effect is unlikely due to chance.
Example:
If a drug trial shows a statistically significant improvement in symptoms, it means the improvement is probably not just random.
Variation
The difference between what is observed in data and what is expected to be seen based on a claim or model.
Example:
When comparing the actual number of red cars on a street to the predicted number, the difference represents the variation.