Glossary
Bad Model Fit (Residual Plot)
Indicated by a residual plot where the residuals show a clear pattern (e.g., curved, funnel-shaped), suggesting that a linear model is not appropriate for the data.
Example:
If a residual plot for predicting a runner's speed based on training time showed a clear U-shaped curve, it would indicate a bad model fit, meaning a linear model isn't capturing the true relationship.
Good Model Fit (Residual Plot)
Indicated by a residual plot where the residuals are randomly scattered around zero with no discernible pattern, suggesting that a linear model is appropriate for the data.
Example:
A good model fit would be evident if a residual plot for predicting plant growth based on sunlight hours showed points scattered like stars in the night sky, without any curves or trends.
Least Squares Criterion
The principle used in linear regression to find the line that minimizes the sum of the squared differences between the observed values and the values predicted by the model.
Example:
When fitting a line to data, the least squares criterion ensures the line is positioned to have the smallest possible total squared errors, making it the 'best fit'.
Least Squares Regression Line (LSRL)
The unique straight line that best describes the linear relationship between two quantitative variables, found by minimizing the sum of the squared residuals.
Example:
When predicting a student's final exam score based on their midterm score, the Least Squares Regression Line provides the best linear equation to make those predictions.
Observed Value (y)
The actual, measured outcome for a given data point in a dataset, representing what truly happened.
Example:
In a study predicting house prices, the actual selling price of a house is its observed value.
Predicted Value (ŷ)
The value estimated by a regression model for a given input, based on the established linear relationship between variables.
Example:
Using a regression model, if you input a student's study hours, the score the model estimates they will get is the predicted value.
Residual
The difference between an observed value (actual) and the value predicted by a regression model (predicted), representing the model's error for that specific data point.
Example:
If a model predicted a student would score 85 on a test, but they actually scored 80, the residual would be -5.
Residual Plot
A scatterplot that displays the residuals on the vertical axis against the predictor (explanatory) variable or the predicted values on the horizontal axis.
Example:
To check if a linear model is appropriate for predicting car prices based on age, you would create a residual plot to visually inspect for patterns in the errors.