Glossary
Coefficient of Determination (R-squared, R²)
The proportion of the variability in the response variable (y) that is explained by the linear relationship with the explanatory variable (x). It measures how well the LSRL fits the data, ranging from 0 to 1.
Example:
An R-squared of 0.65 for a model predicting ice cream sales from daily temperature means that 65% of the variation in ice cream sales can be explained by the linear relationship with temperature.
Computer Printout Interpretation
The skill of identifying and correctly interpreting key statistical values (such as slope, y-intercept, R-squared, and standard deviation of residuals) from the output generated by statistical software.
Example:
When given a computer printout, a student must be able to locate the 'Coef' column to find the slope and y-intercept, and the 'R-Sq' value for the coefficient of determination.
Correlation Coefficient (r)
A standardized measure of the strength and direction of the linear relationship between two quantitative variables. It ranges from -1 to 1.
Example:
A correlation coefficient of -0.85 between daily temperature and heating bill cost indicates a strong negative linear relationship: as temperature increases, heating costs tend to decrease.
Explanatory Variable (x)
The variable that is used to predict or explain changes in the response variable. It is typically plotted on the horizontal (x) axis.
Example:
In a study investigating how the amount of fertilizer affects crop yield, the amount of fertilizer would be the explanatory variable.
Least Squares Regression Line (LSRL)
The line that best models a linear relationship between two quantitative variables by minimizing the sum of the squared differences between the observed and predicted y-values.
Example:
When analyzing the relationship between hours studied and exam scores, the LSRL would be the specific line that minimizes the total squared prediction errors for all students in the dataset.
Residuals
The vertical distances between the observed (actual) y-values and the predicted (ŷ) y-values on the regression line. They represent the errors in the model's predictions.
Example:
If a student scored 90 on a test, but the regression line predicted 85 based on their study time, the residual for that student would be 90 - 85 = 5 points.
Response Variable (y)
The variable whose outcome is being predicted or explained by the explanatory variable. It is typically plotted on the vertical (y) axis.
Example:
In a study investigating how the amount of fertilizer affects crop yield, the crop yield would be the response variable.
Slope (b)
The predicted change in the response variable (y) for every one-unit increase in the explanatory variable (x). It indicates the direction and steepness of the linear relationship.
Example:
In a model predicting a car's fuel efficiency (MPG) based on its weight (in 1000s of pounds), a slope of -3 means that for every additional 1000 pounds, the car's predicted fuel efficiency decreases by 3 MPG.
Standard Deviation of the Residuals (s)
A measure of the typical distance or average error of the observed data points from the least squares regression line. It quantifies the accuracy of predictions.
Example:
If the standard deviation of the residuals for a model predicting house prices is 15,000.
y-intercept (a)
The predicted value of the response variable (y) when the explanatory variable (x) is zero. It's the point where the regression line crosses the y-axis.
Example:
If a regression line predicts plant height based on days of growth, a y-intercept of 2 cm would mean the predicted height of a plant at 0 days (seedling stage) is 2 cm.