All Flashcards
What is the formula for calculating expected counts in a Chi-Square test?
Expected Count = (Row Total * Column Total) / Grand Total
What is the formula for the Chi-Square test statistic?
where O is observed and E is expected.
How do you calculate degrees of freedom for a Chi-Square test?
df = (number of rows - 1) * (number of columns - 1)
What are the differences between Chi-Square Test for Independence and Homogeneity?
Independence: One sample, two categorical variables, exploring relationships within a single group. | Homogeneity: Two or more samples, one categorical variable, comparing distributions across groups.
What are the differences in data collection methods for Independence and Homogeneity tests?
Independence: Requires a simple random sample (SRS). | Homogeneity: Requires a stratified random sample OR randomly assigned treatments (for experiments).
What is the Chi-Square Test for Independence?
A test used to determine if there is a significant association between two categorical variables within a single population.
What is the Chi-Square Test for Homogeneity?
A test used to determine if the distribution of a categorical variable is the same across two or more populations or treatments.
Define 'expected counts' in a Chi-Square test.
The counts we would expect in each cell of a contingency table if the null hypothesis were true. Calculated as (Row Total * Column Total) / Grand Total.
What is a null hypothesis (H0) in the context of a Chi-Square test for independence?
There is no association between two categorical variables (they are independent).
What is a null hypothesis (H0) in the context of a Chi-Square test for homogeneity?
There is no difference in the distribution of a categorical variable across populations/treatments.
What is an alternative hypothesis (Ha) in the context of a Chi-Square test for independence?
There is an association between two categorical variables (they are dependent).
What is an alternative hypothesis (Ha) in the context of a Chi-Square test for homogeneity?
There is a difference in the distribution of a categorical variable across populations/treatments.