Glossary
Association
An indication that the distribution of one categorical variable differs across the categories of another categorical variable, suggesting a relationship between them.
Example:
If the proportion of students who pass a test is significantly different between those who attended tutoring and those who didn't, there is an association between tutoring and passing the test.
Causation
A relationship where a change in one variable directly causes a change in another variable. In statistics, association does not imply causation.
Example:
While there might be an association between ice cream sales and drowning incidents, neither causation the other; a lurking variable like hot weather is likely the cause of both.
Conditional Distribution
The distribution of one categorical variable for a specific category of another categorical variable, showing the proportions within that specific group.
Example:
If we look only at female students, the conditional distribution of their favorite subjects (math, science, English) tells us the proportions of females who prefer each subject.
Joint Relative Frequencies
The proportion of the total sample that falls into a specific combination of categories for two categorical variables, found by dividing the count in a cell by the overall total.
Example:
If 10 out of 100 students are both seniors and prefer online learning, their joint relative frequency is 0.10.
Marginal Distribution
The distribution of a single categorical variable, found by summing the counts or proportions across the categories of the other variable in a two-way table.
Example:
In a table showing favorite colors by gender, the marginal distribution of favorite colors would be the total count or percentage for each color, regardless of gender.
Mosaic Plots
A graphical display for two categorical variables where the area of each rectangle is proportional to the joint relative frequency of the corresponding combination of categories, and bar widths are proportional to the marginal distribution of one variable.
Example:
A mosaic plot could visually represent the relationship between a person's favorite season and their preferred outdoor activity, with the size of each rectangle indicating how common that combination is.
Segmented Bar Graphs
A graphical display where each bar represents a category of one variable, and the bar is divided into segments that show the proportions of a second categorical variable within that category.
Example:
A segmented bar graph could show the proportion of students who walk, bike, or take the bus to school, with separate stacked bars for elementary, middle, and high school students.
Side-by-Side Bar Graphs
A graphical display that uses separate bars for each category of one variable, placed next to each other, to compare the distribution of a second categorical variable across those categories.
Example:
To compare favorite ice cream flavors between males and females, you could use a side-by-side bar graph with separate bars for chocolate, vanilla, and strawberry for each gender.
Two-Way Tables (Contingency Tables)
A table that organizes data for two categorical variables, displaying the counts or percentages of observations for each combination of categories.
Example:
A researcher creates a two-way table to show how many students prefer online classes versus in-person classes, broken down by their grade level (freshman, sophomore, etc.).