Least Squares Regression

Isabella Lopez
8 min read
Study Guide Overview
This guide covers linear regression focusing on the least squares regression line (LSRL). Key concepts include the LSRL equation, calculating and interpreting slope and y-intercept, understanding and interpreting the coefficient of determination (R²) and standard deviation of residuals, and reading computer printouts. Practice questions and a scoring rubric are provided.
#Linear Regression: Your Ultimate Guide 🚀
Hey there, future AP Stats superstar! Let's break down linear regression, one of the most important topics on the exam. This guide will help you understand the key concepts, interpret results, and tackle any question they throw at you. Let's get started!
#Least Squares Regression Line (LSRL)
The least squares regression line (LSRL) is your best friend when modeling linear relationships. It's the line that minimizes the sum of the squared residuals. Remember, residuals are the differences between observed (actual) y-values and predicted (ŷ) values. Think of it as the line that fits the data the best.
The least squares criterion minimizes the sum of the squared residuals, making the model as accurate as possible. We square the residuals to give more weight to larger errors and avoid positive and negative residuals canceling each other out. 🪢
The LSRL equation is: ŷ = a + bx
- ŷ = predicted value of the response variable
- x = explanatory variable
- a = y-intercept
- b = slope
Jump to Coefficient of Determination
Jump to Standard Deviation of Residuals
#LSRL—Slope ⛰️
The slope (b) represents the predicted change in the response variable (y) for every one-unit increase in the explanatory variable (x). It's how much y is expected to change when x goes up by one. The formula for the slope is:
Where:
- r = correlation coefficient
- sy = standard deviation of y
- sx = standard deviation of x
Think of the slope as the 'rise over run' but with standard deviations! The correlation coefficient, r, scales this ratio to fit the data.
#Template for Interpretation
⭐ "There is a predicted increase/decrease of ______ (slope in unit of y variable) for every 1 (unit of x variable)."
#Big Three
- Context
- Correct definition
- Word "predicted"
#LSRL—y-intercept 💛
The y-intercept (a) is the predicted value of the response variable (y) when the explanatory variable (x) is zero. It's the point where the LSRL crosses the y-axis. Remember, the LSRL always passes through the point (x̄, ȳ), where x̄ is the mean of x and ȳ is the mean of y. We can use this to find the y-intercept using the point-slope form:
ŷ - ȳ = b(x - x̄)
Solving for ŷ, we get:
ŷ = bx + (-bx̄ + ȳ)
The y-intercept is (-bx̄ + ȳ)
#Template for Interpretation
⭐ "The predicted value of (y in context) is _____ when (x value in context) is 0 (units in context)."
#Big Three
- Context
- Correct definition
- Word "predicted"
#LSRL—Coefficient of Determination 🍄
The coefficient of determination, or R-squared (R²), tells you how well the LSRL fits the data. It represents the proportion of the variability in the response variable (y) that is explained by the linear relationship with the explanatory variable (x). R² is the square of the correlation coefficient (r).
R² ranges from 0 to 1: 0 means no linear relationship, and 1 means a perfect linear fit.
#Template for Interpretation
⭐ "____% of the variation in (y in context) is due to its linear relationship with (x in context)."
#Big Three
- Context
- Correct definition
- Linking linear relationship
#LSRL—Standard Deviation of the Residuals 🐫
The standard deviation of the residuals (s) measures the typical distance of the data points from the LSRL. It's like the average error of your predictions. The formula is:
Notice that we divide by n-2, not n-1. This is because we lose two degrees of freedom when we estimate the slope and the y-intercept. You'll learn more about this in Unit 9.
#Reading a Computer Printout 🖥️
On the AP exam, you'll often see computer printouts. Here's a sample, with the key stats highlighted:
Always use R-Sq, NEVER R-Sq(adj)! R-Sq(adj) is for multiple regression, which is beyond the scope of AP Stats. 💡
#Final Exam Focus
Okay, you're almost there! Here’s what to focus on for the exam:
- Interpreting Slope and Y-intercept: Always include context and the word "predicted".
- Understanding R-squared: Know how to explain what percentage of the variation in y is explained by x.
- Residuals: Understand what they are and what they mean.
- Computer Printouts: Practice identifying the slope, y-intercept, R-squared, and standard deviation of the residuals.
- Context is Key: Always relate your answers back to the specific scenario in the problem.
#Last-Minute Tips
- Time Management: Don't spend too long on one question. If you get stuck, move on and come back later.
- Common Pitfalls: Watch out for questions that mix up explanatory and response variables. Always double-check your context!
- FRQ Strategies: Clearly label each part of your work. Show your formulas, and write in complete sentences.
#
Practice Question
Practice Questions
#Multiple Choice Questions
-
A researcher studies the relationship between hours of exercise per week and resting heart rate. The regression analysis produces the following equation: ŷ = 75 - 2.5x. What is the correct interpretation of the slope?
a) For every one unit increase in resting heart rate, the hours of exercise per week increases by 2.5
b) For every one hour increase in exercise per week, the resting heart rate is predicted to decrease by 2.5 beats per minute.
c) For every 2.5 hour increase in exercise per week, the resting heart rate is predicted to decrease by 1 beat per minute.
d) For every one hour increase in exercise per week, the resting heart rate is predicted to increase by 2.5 beats per minute.
-
The coefficient of determination (R²) for a linear regression model is 0.81. Which of the following is the correct interpretation?
a) 81% of the variation in the explanatory variable is explained by the response variable.
b) 81% of the variation in the response variable is explained by the linear relationship with the explanatory variable.
c) The correlation between the variables is 0.81. d) 81% of the data points fall on the regression line.
-
A least squares regression line is fit to a set of data. The standard deviation of the residuals is 3.2. Which of the following is a correct interpretation of this value?
a) The typical distance of a data point from the mean of the response variable is 3.2. b) The typical distance of a data point from the least squares regression line is 3.2. c) The average residual is 3.2. d) The standard deviation of the response variable is 3.2. ### Free Response Question
A study was conducted to investigate the relationship between the number of hours students study per week (x) and their final exam scores (y). The following summary statistics were calculated:
- Mean hours of study (x̄): 15 hours
- Mean final exam score (ȳ): 82
- Standard deviation of study hours (sx): 5 hours
- Standard deviation of final exam scores (sy): 8
- Correlation coefficient (r): 0.75
(a) Calculate the slope of the least squares regression line.
(b) Calculate the y-intercept of the least squares regression line.
(c) Write the equation of the least squares regression line.
(d) Interpret the slope of the regression line in the context of the problem.
(e) Interpret the y-intercept of the regression line in the context of the problem.
(f) Calculate the coefficient of determination (R²) and interpret it in the context of the problem.
#FRQ Scoring Rubric
(a) Calculate the slope of the least squares regression line. (1 point)
- Correct Formula: b = r(sy/sx) (0.5 point)
- Correct Calculation: b = 0.75 * (8/5) = 1.2 (0.5 point)
(b) Calculate the y-intercept of the least squares regression line. (1 point)
- Correct Formula: a = ȳ - bx̄ (0.5 point)
- Correct Calculation: a = 82 - 1.2 * 15 = 64 (0.5 point)
(c) Write the equation of the least squares regression line. (1 point)
- Correct Equation: ŷ = 64 + 1.2x (1 point)
(d) Interpret the slope of the regression line in the context of the problem. (1 point)
- Correct Interpretation: For every additional hour of study per week, the final exam score is predicted to increase by 1.2 points. (1 point)
(e) Interpret the y-intercept of the regression line in the context of the problem. (1 point)
- Correct Interpretation: The predicted final exam score for a student who studies 0 hours per week is 64 points. (1 point)
(f) Calculate the coefficient of determination (R²) and interpret it in the context of the problem. (2 points)
- Correct Calculation: R² = r² = (0.75)² = 0.5625 or 56.25% (1 point)
- Correct Interpretation: 56.25% of the variation in final exam scores can be explained by the linear relationship with the number of hours studied per week. (1 point)
#Answers to Multiple Choice Questions
- b
- b
- b
Good luck on your AP Stats exam! You've got this!

How are we doing?
Give us your feedback and let us know how we can improve