Glossary
Alternative Hypothesis (Ha)
The alternative hypothesis (Ha) is a statement that contradicts the null hypothesis, suggesting there is a difference, effect, or relationship.
Example:
If the null hypothesis states a coin is fair, the alternative hypothesis would state that the coin is not fair (i.e., the probability of heads is not 0.5).
Chi-Square Distributions
Chi-square distributions are a family of right-skewed probability distributions used for hypothesis tests involving categorical data, with their shape determined by degrees of freedom.
Example:
When analyzing the distribution of errors in a manufacturing process, you might use chi-square distributions to determine if the observed error types match a theoretical pattern.
Chi-Square Goodness-of-Fit Test
The chi-square goodness-of-fit test assesses whether the observed frequency distribution of a single categorical variable matches a hypothesized or claimed population distribution.
Example:
A researcher might use a chi-square goodness-of-fit test to see if the observed distribution of M&M colors in a bag matches the proportions claimed by the candy company.
Chi-Square Statistic
The chi-square statistic quantifies the discrepancy between observed frequencies and expected frequencies in categorical data, indicating how much the actual results deviate from what's anticipated.
Example:
A large chi-square statistic in a survey comparing observed political affiliations to national averages would suggest a significant difference in the sample's distribution.
Degrees of Freedom (df)
Degrees of freedom (df) is a parameter that specifies the shape of a chi-square distribution, typically calculated as (number of categories - 1) for goodness-of-fit tests.
Example:
If you're testing whether a coin is fair by observing two outcomes (heads/tails), the degrees of freedom would be 2 - 1 = 1.
Expected Counts
Expected counts represent the anticipated frequencies in each category if the null hypothesis were true, serving as a baseline for comparison with observed data.
Example:
If a fair six-sided die is rolled 120 times, the expected count for each face (e.g., rolling a '3') would be 120 / 6 = 20.
Independence (Condition)
The 'Independence' condition states that individual observations in the sample must be independent of each other, often checked by ensuring the population size is at least 10 times the sample size (10% rule).
Example:
When sampling students from a large university, the independence condition is met if the sample size is less than 10% of the total student population.
Large Counts (Condition)
The 'Large Counts' condition for chi-square tests requires that all expected counts in each category are at least 5, ensuring the sampling distribution of the test statistic is approximately chi-square.
Example:
If you're testing preferences for five different types of music, you must ensure that the large counts condition is met by having at least 5 expected listeners for each music type.
Null Hypothesis
The null hypothesis (H0) is a statement of no effect, no difference, or no relationship between variables, which is assumed true until evidence suggests otherwise.
Example:
For a study on a new fertilizer, the null hypothesis might state that the fertilizer has no effect on plant growth.
P-value
The p-value is the probability of observing data as extreme as, or more extreme than, the sample data, assuming the null hypothesis is true.
Example:
If a study yields a p-value of 0.01, it means there's only a 1% chance of seeing such results if the null hypothesis were actually true, leading to its rejection.
Random (Condition)
The 'Random' condition requires that the data come from a random sample or a randomized experiment to ensure the sample is representative of the population.
Example:
Before conducting a survey, ensuring that participants are selected using a simple random sampling method helps meet this crucial condition for inference.