zuai-logo

Sampling Distributions for Differences in Sample Means

Jackson Hernandez

Jackson Hernandez

9 min read

Listen to this study note

Study Guide Overview

This study guide covers sampling distributions of the difference between two means. It explains key formulas for the mean and standard deviation of this distribution. The guide emphasizes the Central Limit Theorem (CLT) and its application to these distributions, including the condition of large sample sizes (n ≥ 30). Finally, it provides a practice problem and further practice questions with solutions, focusing on formula application, CLT conditions, and interpretation of results.

Sampling Distributions of the Difference Between Two Means

Hey there, future AP Stats rockstar! 🌟 Let's break down the sampling distribution of the difference between two means. This is a big topic, but we'll make it super clear and easy to remember. Let's get started!

Formulas and Key Concepts

First, let's talk about the formulas. Remember, we're dealing with the difference between two sample means (x̄1 - x̄2). The goal here is to understand how these sample differences vary if we were to take many, many samples.

Key Concept

The standard deviation of the sampling distribution of the difference between two means is found using the Pythagorean Theorem of Statistics. This is a fancy way of saying we combine the variances (not standard deviations!) of each sample, then take the square root.

Here are the key formulas:

  • Mean of the sampling distribution: μ(x̄1 - x̄2) = μ1 - μ2
  • Standard deviation of the sampling distribution:

σxˉ1xˉ2=σ12n1+σ22n2\sigma_{\bar{x}_1 - \bar{x}_2} = \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}

Where: * μ1 and μ2 are the population means. * σ1 and σ2 are the population standard deviations. * n1 and n2 are the sample sizes.

Memory Aid

Think of it like this: When combining the variability of two distributions, you can't just add standard deviations. You have to add the variances (squared standard deviations) and then take the square root to get the combined standard deviation. It's like combining the sides of a right triangle to get the hypotenuse!

Here are the images to help you visualize the formulas:

Formulas

Source: AP Statistics Formula Sheet

Sampling Distribution

Source: The AP Statistics CED

Sampling Distribution

Normal Condition: Central Limit Theorem (CLT) 🎈

Now, let's talk about when we can assume our sampling distribution is approximately normal. This is crucial because many statistical tests rely on this assumption.

Quick Fact

If both populations are normally distributed, then the sampling distribution of the difference in sample means is also normally distributed. 🎉

But what if the populations aren't normal? That's where the Central Limit Theorem (CLT) comes to the rescue!

Key Concept

Central Limit Theorem (CLT): If your sample sizes are large enough (typically n ≥ 30 for both samples), then the sampling distribution of the difference in sample means will be approximately normal, regardless of the shape of the original population distributions.

Exam Tip

Always check the conditions! Before you use normal-based techniques, make sure you've checked that either the populations are normal or both sample sizes are at least 30. This is a common point where students lose marks.

Practice Problem

Let's apply what we've learned to a real-world example. Imagine you're a publisher comparing the sales of romance and science fiction novels. You take random samples of 50 books from each genre.

  • Romance: x̄1 = 500 copies, s1 = 100 copies
  • Science Fiction: x̄2 = 400 copies, s2 = 150 copies

a) Explain what the sampling distribution for the difference in sample means represents and why it is useful in this situation.

b) Suppose that the true population mean number of copies sold for romance novels is actually 550 copies and the true population mean number of copies sold for science fiction novels is actually 450 copies. Describe the shape, center, and spread of the sampling distribution for the difference in sample means in this case.

c) Explain why the Central Limit Theorem applies to the sampling distribution for the difference in sample means in this situation.

d) Discuss one potential source of bias that could affect the results of this study, and explain how it could influence the estimate of the difference in population means.

Answer

a) The sampling distribution for the difference in sample means represents the distribution of all possible differences in sample means (x̄1 - x̄2) that we would get if we repeated the study many times. It's useful because it allows us to make inferences about the true difference in population means (μ1 - μ2) based on our sample data. It's like having a map of all possible outcomes, which helps us understand how likely our observed difference is.

b) If the true population means are μ1 = 550 and μ2 = 450, then the sampling distribution of the difference in sample means would be approximately normal with a mean of 550 - 450 = 100. The spread (standard deviation) would be calculated using the formula we discussed earlier, using the population standard deviations (if known) or the sample standard deviations as estimates. The shape is approximately normal due to the Central Limit Theorem.

c) The Central Limit Theorem applies because both sample sizes (n1 = 50 and n2 = 50) are greater than 30. This means that even if the original sales distributions for each genre are not normal, the sampling distribution of the difference in sample means will be approximately normal. This allows us to use normal-based statistical methods.

d) One potential source of bias is self-selection bias. For example, if romance novel readers are more likely to buy books from specific retailers, our sample might overrepresent those readers, leading to an overestimate of the true average sales of romance novels. Similarly, if science fiction readers prefer online retailers, we might underestimate their average sales. This bias can skew the estimated difference in means.

Final Exam Focus 🎯

Okay, you're almost there! Here's what to focus on for the exam:

  • Understanding and applying the formulas for the mean and standard deviation of the sampling distribution of the difference between two means. Make sure you can calculate these values correctly.
  • Knowing when and how to apply the Central Limit Theorem. Remember the n ≥ 30 rule for both samples.
  • Interpreting the sampling distribution in context. Be able to explain what the distribution represents and why it's useful for making inferences.
  • Identifying potential sources of bias. This is a common theme in FRQs, so be prepared to discuss how different types of bias can affect your results.
Exam Tip

Time Management: On the exam, quickly identify the type of problem you're facing (difference of means, proportions, etc.). Then, jot down the relevant formulas and conditions you need to check. This will save you time and prevent careless errors.

Common Mistake

Common Pitfall: Forgetting to check the conditions for normality. Always double-check the sample sizes or whether the populations are normally distributed before proceeding with normal-based techniques. This is a common mistake that can cost you points.

Practice Questions

Here are some practice questions to test your understanding:

Practice Question

Multiple Choice Questions:

  1. Two independent random samples are selected from two populations. Sample 1 has a sample size of 40 and a sample mean of 75. Sample 2 has a sample size of 50 and a sample mean of 60. Assume that the population standard deviations are 10 and 12, respectively. What is the standard deviation of the sampling distribution of the difference in sample means? (A) 1.85 (B) 2.24 (C) 2.56 (D) 2.84 (E) 3.02

  2. A researcher wants to compare the mean scores of two groups on a standardized test. Group A has 35 participants, and Group B has 45 participants. The researcher knows that the population distributions are not normal. Which of the following is true about the sampling distribution of the difference in sample means? (A) It is exactly normal. (B) It is approximately normal due to the Central Limit Theorem. (C) It is skewed to the left. (D) It is skewed to the right. (E) It cannot be determined.

Free Response Question:

A study was conducted to compare the effectiveness of two different fertilizers on plant growth. A random sample of 30 plants was treated with Fertilizer A, and another random sample of 35 plants was treated with Fertilizer B. The mean growth for Fertilizer A was 15 cm with a standard deviation of 3 cm, and the mean growth for Fertilizer B was 18 cm with a standard deviation of 4 cm.

(a) What is the mean of the sampling distribution of the difference in sample means (Fertilizer A - Fertilizer B)? (b) What is the standard deviation of the sampling distribution of the difference in sample means? (c) What is the probability that the sample mean growth for Fertilizer A is greater than the sample mean growth for Fertilizer B? (d) Explain whether the Central Limit Theorem applies in this situation. Justify your answer.

Scoring Breakdown:

(a) (1 point)

  • Correct mean: 15 - 18 = -3 cm

(b) (2 points)

  • Correct formula: 3230+4235\sqrt{\frac{3^2}{30} + \frac{4^2}{35}}
  • Correct calculation: 930+16350.89\sqrt{\frac{9}{30} + \frac{16}{35}} \approx 0.89

(c) (2 points)

  • Recognizing that the sampling distribution is approximately normal (due to CLT) and the standard deviation from part (b)
  • Calculating the probability: P(xˉAxˉB>0)P(x̄_A - x̄_B > 0). z=0(3)0.893.37z = \frac{0 - (-3)}{0.89} \approx 3.37. P(Z>3.37)0.0004P(Z > 3.37) \approx 0.0004 (or very close to 0)

(d) (2 points)

  • Correctly stating that the Central Limit Theorem applies because both sample sizes (n=30 and n=35) are greater than or equal to 30. * Justification: The Central Limit Theorem states that the sampling distribution of the sample mean becomes approximately normal as the sample size increases, regardless of the shape of the population distribution.

You've got this! Remember to stay calm, review these notes, and trust in your preparation. You're going to do great! 💪

Question 1 of 8

If the population mean for group 1 is 150 and the population mean for group 2 is 120, what is the mean of the sampling distribution of the difference between the sample means? 🤔

270

30

-30

12.25