zuai-logo

Representing Two Categorical Variables

Noah Martinez

Noah Martinez

9 min read

Study Guide Overview

This study guide covers analyzing relationships between categorical variables. It explores two-way tables (contingency tables), including joint and marginal relative frequencies. Visualizations covered include side-by-side bar graphs, segmented bar graphs, and mosaic plots. Finally, it explains how to determine associations between variables from these representations, emphasizing the distinction between correlation and causation.

Exploring Relationships Between Categorical Variables 📊

Hey there! Let's dive into how we can visualize and analyze data from two categorical variables. This is a crucial area, and understanding these concepts will definitely boost your confidence on the exam. We'll cover various graphical and numerical methods, focusing on how they help us identify associations between variables. Remember, correlation doesn't equal causation! Let's get started!


Jump to Two-Way Tables
Jump to Side-by-Side Bar Graphs
Jump to Segmented Bar Graphs
Jump to Mosaic Plots
Jump to Determining Associations
Jump to Practice Questions


Two-Way Tables (Contingency Tables) 🚦

Two-way tables, or contingency tables, are a fantastic way to organize data from two categorical variables. Think of them as grids where each cell shows the count or percentage of data points that fall into a specific combination of categories.


Quick Fact

Two-way tables display how individuals are distributed across different categories. They are the foundation for many other types of analysis.


Here's an example:

Two-Way Table Example

Caption: This table shows the distribution of survey participants based on their gender and their perceived chance of getting rich.


In this table, we can see that:

  • 194 participants felt they had almost no chance of getting rich.
  • 2367 participants were female.
  • 758 males thought there was a good chance they could get rich.

Joint Relative Frequencies 1️⃣

Instead of counts, we can also use joint relative frequencies in a two-way table. These are the proportions of the total sample that fall into each combination of categories. For example, the joint relative frequency of being female and almost certain about getting rich is 486/4826 (the number of females who are almost certain divided by the total number of participants).


Quick Fact

In a joint relative frequency table, the bottom-right cell (overall total) will always be 1.00.


Side-by-Side Bar Graphs 💰

Side-by-side bar graphs display two separate bar charts next to each other, one for each categorical variable. This allows us to visually compare the proportions of data points in each category of one variable across the categories of the other variable.


Here's an example:

Side-by-Side Bar Graph Example

Caption: This graph shows the distribution of opinions on wealth by gender.


In this graph, gender is the dividing category, and each bar shows the percentage of each gender holding a specific opinion about age and wealth.


Segmented Bar Graphs 🍁

Segmented bar graphs are similar to side-by-side bar graphs, but they show the proportions of data points as segments within a single bar. This is great for comparing proportions within each category of one variable across the categories of the other variable.


Memory Aid

Think of segmented bar graphs as stacked bars, where each segment represents a different category within the main bar.


Here's an example:

Segmented Bar Graph Example

Caption: This graph shows the distribution of opinions on wealth within each gender.


In this case, each bar represents a gender, and the segments within each bar show the proportion of each opinion about wealth. The cumulative frequency for each gender is 100%, allowing for easy comparison of responses.


Mosaic Plots 🪢

Mosaic plots are a bit different. They divide the plot into rectangles, where the area of each rectangle is proportional to the joint relative frequency of each combination of categories. The widths of the bars are proportional to the number of people in each primary category. Think of them as a visual version of a two-way table.


Quick Fact

In mosaic plots, the area of each region represents the joint relative frequency.


Here's an example:

Mosaic Plot Example

Caption: This mosaic plot shows the relationship between gender and opinions on wealth, with areas proportional to joint relative frequencies.


Determining Associations from Graphical Representations 🪢

So, how do we use these graphs to determine if there's an association between two categorical variables? If the heights (or widths) of corresponding segments are significantly different across categories, it suggests an association. This means that a certain group in one category is more likely to have a certain response in another category.


Key Concept

An association between two variables means that they are dependent or correlated. However, it does NOT imply causation!


For example, if we're looking at class level (junior, senior, etc.) and homework completion, we can use a side-by-side bar graph or mosaic plot. If the proportion of students completing homework on time varies significantly across classes, it suggests an association between class level and homework completion.


Common Mistake

Remember, correlation does not imply causation! Just because two variables are associated doesn't mean one causes the other. There could be other factors at play.


Exam Tip

When analyzing graphs, look for significant differences in the heights or widths of bars/segments. This will help you determine if there is an association between the variables.


Understanding how to interpret two-way tables and various graphical representations is essential for both multiple-choice and free-response questions. Pay close attention to the differences between these methods.


Practice Questions

Practice Question

Multiple Choice Questions

  1. A survey asked a random sample of adults in the United States about their political affiliation and their views on a certain policy. The results are summarized in the table below:
SupportOpposeUndecided
Republican1505025
Democrat10012030
Independent809040

Which of the following statements is true?

(A) There is no association between political affiliation and views on the policy. (B) There is a strong association between political affiliation and views on the policy. (C) The majority of Republicans support the policy. (D) The majority of Democrats oppose the policy. (E) The majority of Independents are undecided about the policy.

  1. A researcher is studying the relationship between pet ownership and stress levels. They collect data from a group of participants and create a segmented bar graph. Which of the following is best determined from this graph?

(A) The exact number of participants who own a pet and have low stress. (B) The exact number of participants who do not own a pet and have high stress. (C) The proportion of pet owners that have high, medium, and low stress. (D) The average stress level of pet owners. (E) The average stress level of non-pet owners.

Free Response Question

A study was conducted to investigate the association between the type of college (public or private) and the likelihood of students completing their degree within four years. The data are summarized in the following table:

Completed in 4 YearsDid Not Complete in 4 YearsTotal
Public College6503501000
Private College7003001000

(a) Calculate the marginal distribution of college type.

(b) Calculate the conditional distribution of completion status for each type of college.

(c) Construct a side-by-side bar graph to display the conditional distributions calculated in part (b).

(d) Based on your calculations and the graph, is there an association between the type of college and the likelihood of completing a degree within four years? Explain your reasoning.

Answer Key & Scoring Guidelines

Multiple Choice Questions

  1. (B) There is a strong association between political affiliation and views on the policy.
  2. (C) The proportion of pet owners that have high, medium, and low stress.

Free Response Question

(a) Marginal Distribution of College Type

  • Public College: 1000 / 2000 = 0.50 or 50%

  • Private College: 1000 / 2000 = 0.50 or 50%

  • 1 point for correctly calculating the marginal distribution of college type.

(b) Conditional Distribution of Completion Status

  • Public College:

    • Completed in 4 Years: 650 / 1000 = 0.65 or 65%
    • Did Not Complete in 4 Years: 350 / 1000 = 0.35 or 35%
  • Private College:

    • Completed in 4 Years: 700 / 1000 = 0.70 or 70%
    • Did Not Complete in 4 Years: 300 / 1000 = 0.30 or 30%
  • 1 point for correctly calculating the conditional distribution of completion status for each type of college.

(c) Side-by-Side Bar Graph

  • A side-by-side bar graph with two sets of bars (one for public and one for private), each divided into two segments representing completion status (completed in 4 years and did not complete in 4 years).

  • The y-axis should represent the proportions or percentages.

  • Appropriate labels for axes, bars, and segments are needed.

  • 1 point for a correctly constructed side-by-side bar graph.

(d) Association Analysis

  • Yes, there is an association. The proportion of students completing their degree within four years is different for public and private colleges. Specifically, a higher proportion of students at private colleges complete their degree within four years (70% vs. 65%).

  • Explanation should refer to the differences in conditional proportions calculated in part (b) and be supported by the graph in part (c).

  • 1 point for correctly identifying the association and providing a valid explanation supported by the calculations and the graph.


Let me know if you have any other questions. You've got this! 🚀

Question 1 of 8

In a two-way table displaying survey results on favorite color and pet preference, if the cell corresponding to 'Blue' and 'Dog' has a count of 50, what does this number represent?

The total number of people who prefer blue

The total number of people who prefer dogs

The number of people who prefer both blue and dogs

The combined number of people who like blue or dogs