Glossary
Alternative Hypothesis (Ha)
The statement that there is an effect, a difference, or an association between variables or populations. It is what the researcher is trying to find evidence for.
Example:
If a new fertilizer is being tested, the Alternative Hypothesis (Ha) would claim that the new fertilizer does increase crop yield compared to the old one.
Chi-Square Test Statistic (χ²)
A calculated value that measures the discrepancy between the observed counts and the expected counts in a chi-square test. A larger value indicates greater deviation from the null hypothesis.
Example:
After collecting data on movie genre preferences by age group, the calculated Chi-Square Test Statistic (χ²) was 15.7, indicating a notable difference from what would be expected if age and genre preference were independent.
Chi-Square Test for Homogeneity
A statistical test used to compare the distribution of a single categorical variable across two or more independent populations or treatment groups.
Example:
To determine if the distribution of preferred social media platforms (Instagram, TikTok, X) is the same for college freshmen and college seniors, a statistician would perform a Chi-Square Test for Homogeneity.
Chi-Square Test for Independence
A statistical test used to determine if there is an association between two categorical variables within a single sample or population.
Example:
A researcher uses a Chi-Square Test for Independence to see if there's a relationship between a student's favorite subject (math, science, history) and their preferred learning style (visual, auditory, kinesthetic) among all high schoolers in a district.
Conclusion (for hypothesis tests)
The final statement in a hypothesis test, which interprets the p-value in relation to the significance level and states whether there is sufficient evidence to reject or fail to reject the null hypothesis, always in context.
Example:
Based on a p-value of 0.03 and a significance level of 0.05, the Conclusion would be: 'Since the p-value (0.03) is less than alpha (0.05), we reject the null hypothesis. There is sufficient evidence to conclude that the new teaching method significantly improves student scores.'
Conditions for Inference
Specific requirements that must be met for a statistical test to be valid and for its results to be reliable. Failing to meet these can invalidate conclusions.
Example:
Before conducting any hypothesis test, always check the Conditions for Inference like randomness, independence, and large counts to ensure the results are trustworthy.
Degrees of Freedom (df)
A value that specifies the number of independent pieces of information used to calculate a statistic. For a chi-square test, it's calculated as (rows - 1) * (columns - 1).
Example:
In a 3x4 contingency table (3 rows, 4 columns), the Degrees of Freedom (df) would be (3-1) * (4-1) = 2 * 3 = 6.
Expected Counts
The number of observations that would be anticipated in each cell of a contingency table if the null hypothesis (e.g., no association or no difference) were true.
Example:
If 50% of students prefer online learning and 200 students are surveyed, the Expected Counts for online preference would be 100, assuming no association with other variables.
Independence Condition (10% condition)
When sampling without replacement, this condition states that the sample size (n) must be less than 10% of the population size (N) to ensure that individual observations are approximately independent.
Example:
If you survey 50 students from a high school with 800 students, the Independence Condition (10% condition) is met because 50 is less than 10% of 800 (which is 80).
Large Counts Condition
A condition for chi-square tests requiring that all expected counts in the contingency table must be at least 5. This ensures the sampling distribution of the test statistic is approximately chi-square.
Example:
When analyzing survey data on pet preferences, if the expected number of people who prefer hamsters is 3, the Large Counts Condition is violated, and the chi-square test may not be appropriate.
Null Hypothesis (H0)
The statement of no effect, no difference, or no association between variables or populations. It is the hypothesis assumed to be true until evidence suggests otherwise.
Example:
For a study comparing two teaching methods, the Null Hypothesis (H0) would state that there is no difference in student performance between Method A and Method B.
P-value
The probability of observing a test statistic as extreme as, or more extreme than, the one calculated, assuming the null hypothesis is true. A small p-value suggests evidence against the null hypothesis.
Example:
If a study yields a P-value of 0.02, it means there's only a 2% chance of seeing such results if the null hypothesis were true, leading to a rejection of the null.
Randomly Assigned Treatments
A key principle in experimental design where subjects are assigned to different treatment groups using a random process, helping to balance lurking variables across groups.
Example:
In a medical trial, patients are Randomly Assigned Treatments (e.g., new drug vs. placebo) to ensure that any observed differences in outcomes are likely due to the treatment and not pre-existing conditions.
Significance Level (α)
A predetermined threshold (commonly 0.05) used to decide whether to reject the null hypothesis. If the p-value is less than or equal to alpha, the null hypothesis is rejected.
Example:
Setting the Significance Level (α) at 0.01 means that a researcher is willing to accept only a 1% chance of making a Type I error (incorrectly rejecting a true null hypothesis).
Simple Random Sample (SRS)
A sampling method where every individual and every possible group of individuals in the population has an equal chance of being selected for the sample.
Example:
To get a representative view of student opinions, a school might use a random number generator to select 100 student IDs for a survey, ensuring a Simple Random Sample (SRS).
Stratified Random Sample
A sampling method where the population is divided into non-overlapping subgroups (strata) based on a shared characteristic, and then a simple random sample is drawn from each stratum.
Example:
To study opinions across different grade levels, a researcher might take a Stratified Random Sample by randomly selecting 20 students from each of the 9th, 10th, 11th, and 12th grades.