Glossary
Exponential Models
Nonlinear models of the form ŷ = abˣ, where the response variable grows or decays at a constant percentage rate, linearized by taking the natural logarithm of the y-values.
Example:
The spread of a virus often follows an exponential model, where the number of infected individuals increases rapidly over time.
High-Leverage Points
Points with x-values that are far from the other x-values in the dataset.
Example:
When analyzing car weight vs. fuel efficiency, a data point for a monster truck (extremely high weight) would be a high-leverage point.
Influential Points
Data points that, if removed, would significantly change the slope, y-intercept, or correlation of the regression model.
Example:
In a study of study hours vs. test scores, a student who studied 100 hours and scored 50% might be an influential point because their data drastically pulls the regression line down.
Nonlinear Models
Statistical models used when the relationship between variables is not a straight line, often requiring data transformation to achieve linearity.
Example:
The growth of a bacterial colony over time often follows a curve, requiring a nonlinear model like an exponential one to accurately predict future population sizes.
Outliers
Points with y-values that are far from the regression line, meaning they have large residuals.
Example:
If most students score between 70-90% on a test, but one student scores 20% despite studying an average amount, that student's score would be an outlier.
Power Models
Nonlinear models of the form ŷ = axᵇ, where the response variable changes proportionally to a power of the explanatory variable, linearized by taking the natural logarithm of both x and y values.
Example:
The relationship between the area of a square and its side length (Area = side²) is a simple power model.
Residual Plots
A scatterplot of the residuals (observed y - predicted y) against the explanatory variable (x) or the predicted y-values, used to assess the appropriateness of a regression model.
Example:
If a residual plot shows a clear U-shaped pattern, it indicates that a linear model is not a good fit for the data, and a nonlinear model might be more appropriate.
R² Value
The coefficient of determination, which represents the proportion of the variance in the dependent variable that is predictable from the independent variable(s).
Example:
An R² value of 0.85 for a model predicting house prices based on square footage means that 85% of the variation in house prices can be explained by the variation in square footage.
Transform (data transformation)
The process of applying a mathematical function to data (e.g., logarithm, square root) to make a nonlinear relationship linear, allowing for linear regression analysis.
Example:
To analyze the relationship between a planet's distance from the sun and its orbital period, astronomers might transform both variables using logarithms to reveal a linear pattern.