Skills Focus: Selecting an Appropriate Inference Procedure for Categorical Data

Isabella Lopez
9 min read
Study Guide Overview
This AP Statistics study guide covers chi-squared tests for categorical data, focusing on choosing the correct test: Goodness of Fit, Independence, and Homogeneity. It explains how to identify the appropriate test based on the number of samples and variables, provides practice problems and FRQs, and emphasizes writing hypotheses in context. The guide also reviews conditions for inference and offers exam tips.
#AP Statistics: Chi-Squared Tests - Your Ultimate Review π
Hey there, future AP Stats pro! Let's break down those tricky chi-squared tests and get you feeling confident for the exam. This guide is designed to be your go-to resource, especially the night before the big day. Let's make this easy and engaging!
#Unit 8: Inference for Categorical Data
This unit is super important, so let's dive right in! We'll focus on making sure you can confidently choose the right chi-squared test. Remember, this is a high-stakes area, so mastering it will pay off big time.
#Choosing the Right Chi-Squared Test: The Core Skill π―
The most crucial part of Unit 8 is selecting the correct chi-squared test. It's a common source of confusion, but we'll clear it up right now! Here's the breakdown:
Goodness of Fit Test: One sample, one categorical variable with more than two categories. Think: "Does this sample fit a known distribution?"
Test for Independence: One sample, two categorical variables with multiple categories. Think: "Are these two variables related?"
Test for Homogeneity: Two or more samples, one categorical variable with multiple categories. Think: "Are these populations the same?"
Mnemonic: G.I.H. (Goodness, Independence, Homogeneity) helps remember the order. Also, think:
- Goodness of Fit: One sample, one variable.
- Independence: One sample, two variables.
- Homogeneity: Two or more samples, one variable.
Pay close attention to the number of samples and variables. This is the key to choosing the correct test.
#Visualizing the Chi-Squared Tests
Source: Dan Shuster
This image is a great visual aid. Notice how the number of samples and variables guides you to the correct test. Keep this in mind as you work through problems.
#Example: Breaking Down an FRQ π
Let's look at a real AP exam question to see how this works in practice. This example is from the 2009 AP Statistics exam.
Image from released College Board material
- Identify Data Type: The data is categorical. This means we're in the realm of z-tests or chi-squared tests.
- Count Variables and Categories: We have two categorical variables (gender and job experience), each with multiple categories. This eliminates z-tests, which only work with one or two categories per variable.
- Narrow Down Chi-Squared Tests:
- The data is in a two-way table. This means it's either a test for independence or homogeneity.
- We have one sample (high school seniors). This indicates we are looking for an association between gender and job experience, not a difference in populations.
- Conclusion: We should use a chi-squared test for independence. π
#Hypotheses
-
H0: There is no association between gender and job experience for high school seniors in the district.
-
Ha: There is an association between gender and job experience for high school seniors in the district.
Always put your hypotheses in context! The null hypothesis is always the "expected" outcome (i.e., no special relationship).
#Practice Problems: Putting It All Together πͺ
Let's solidify your understanding with some practice problems. Remember, the key is to identify the number of samples and variables.
#Problem 1
A researcher wants to know if the distribution of favorite ice cream flavors is the same among college students and the general population. They survey 500 college students and 1000 people from the general population. π¨
Which test should the researcher use and why?
#Problem 2
A scientist is testing a new treatment and divides 100 patients into a treatment and a control group. They want to know if the treatment is effective at reducing the disease in male and female patients. π¦
Which test should the scientist use and why?
#Problem 3
A travel company wants to see if their vacation package choices fit a theoretical distribution. They survey 1000 customers about beach, mountain, city, and rural vacation packages. π
Which test should the travel company use and why?
#Answers
-
Chi-Squared Test for Independence. The researcher is comparing two populations (college students and general population) to see if their one variable (ice cream preference) is related.
-
Chi-Squared Test for Homogeneity. The scientist is comparing two groups (treatment and control) to see if their one variable (disease occurrence) is the same across groups.
-
Chi-Squared Test for Goodness of Fit. The travel company is comparing one sample to a theoretical distribution of package choices.
A common mistake is confusing homogeneity and independence. Remember, homogeneity compares groups, while independence looks for relationships between variables within a single group.
#Final Exam Focus π―
Okay, you're almost there! Here's a quick rundown of what to focus on for the exam:
-
Master the Chi-Squared Test Selection: This is the most important part of Unit 8. Practice identifying the correct test based on the number of samples and variables.
-
Context is Key: Always write your hypotheses in the context of the problem.
-
Review the Conditions for Inference: Make sure you understand the assumptions required for chi-squared tests (randomness, independence, large counts).
-
Practice FRQs: Work through released FRQs to get comfortable with the format and scoring.
Time Management: Don't spend too long on one question. If you're stuck, move on and come back later. Make sure you show all your work, even if you're not sure of the final answer.
#
Practice Question
Practice Questions
#Multiple Choice Questions
-
A researcher is investigating the relationship between hours of sleep and stress levels among college students. They survey a random sample of students and record their hours of sleep (categorized as <6 hours, 6-8 hours, >8 hours) and their stress levels (categorized as low, medium, high). Which of the following is the appropriate test to use? (A) One-sample t-test for a mean (B) Two-sample t-test for a difference in means (C) Chi-squared test for goodness of fit (D) Chi-squared test for independence (E) Chi-squared test for homogeneity
-
A company wants to determine if the distribution of customer satisfaction ratings (very satisfied, satisfied, neutral, dissatisfied, very dissatisfied) is the same across three different store locations. They survey a random sample of customers at each store. Which of the following is the appropriate test to use? (A) One-sample t-test for a mean (B) Two-sample t-test for a difference in means (C) Chi-squared test for goodness of fit (D) Chi-squared test for independence (E) Chi-squared test for homogeneity
-
A researcher is studying the distribution of eye color in a population. They want to know if the observed distribution of eye colors (brown, blue, green, other) matches a theoretical distribution based on genetic principles. Which of the following is the appropriate test to use? (A) One-sample t-test for a mean (B) Two-sample t-test for a difference in means (C) Chi-squared test for goodness of fit (D) Chi-squared test for independence (E) Chi-squared test for homogeneity
#Free Response Question
A survey was conducted at a large university to investigate the relationship between students' living arrangements (on-campus vs. off-campus) and their level of involvement in extracurricular activities (low, medium, high). The results are summarized in the table below:
Low Involvement | Medium Involvement | High Involvement | Total | |
---|---|---|---|---|
On-Campus | 120 | 180 | 100 | 400 |
Off-Campus | 150 | 250 | 200 | 600 |
Total | 270 | 430 | 300 | 1000 |
(a) State the null and alternative hypotheses for this study.
(b) Calculate the expected counts for each cell in the table under the assumption that the null hypothesis is true.
(c) Calculate the chi-squared test statistic.
(d) Calculate the degrees of freedom and the p-value.
(e) Based on the p-value, what conclusion can be drawn from this study at a significance level of Ξ± = 0.05?
#FRQ Scoring Rubric
(a) Hypotheses (1 point) - Null hypothesis: There is no association between students' living arrangements and their level of involvement in extracurricular activities. - Alternative hypothesis: There is an association between students' living arrangements and their level of involvement in extracurricular activities.
(b) Expected Counts (2 points) - Correctly calculates expected counts for each cell using the formula: (row total * column total) / grand total. - Expected counts: - On-Campus, Low: (400 * 270) / 1000 = 108 - On-Campus, Medium: (400 * 430) / 1000 = 172 - On-Campus, High: (400 * 300) / 1000 = 120 - Off-Campus, Low: (600 * 270) / 1000 = 162 - Off-Campus, Medium: (600 * 430) / 1000 = 258 - Off-Campus, High: (600 * 300) / 1000 = 180
(c) Chi-Squared Test Statistic (2 points) - Correctly calculates the chi-squared test statistic using the formula: β[(observed - expected)^2 / expected]. - Ο2 = [(120-108)^2 / 108] + [(180-172)^2 / 172] + [(100-120)^2 / 120] + [(150-162)^2 / 162] + [(250-258)^2 / 258] + [(200-180)^2 / 180] = 7.503
(d) Degrees of Freedom and P-Value (2 points) - Degrees of freedom = (number of rows - 1) * (number of columns - 1) = (2 - 1) * (3 - 1) = 2 - P-value: Using a chi-squared distribution with 2 degrees of freedom, p-value β 0.0235
(e) Conclusion (1 point) - Since the p-value (0.0235) is less than the significance level (0.05), we reject the null hypothesis. - There is sufficient evidence to conclude that there is an association between students' living arrangements and their level of involvement in extracurricular activities.
Remember to show all your work, even if you're using a calculator. Partial credit is your friend!
#You've Got This! π
You've made it through the chi-squared tests! Remember, the key is to stay calm, think through the problem step-by-step, and apply what you've learned. You're well-prepared for the AP Statistics exam. Now go out there and crush it! π
Explore more resources

How are we doing?
Give us your feedback and let us know how we can improve