Glossary
Causation
A relationship where one variable directly causes a change in another variable. Correlation does not imply causation.
Example:
While there might be a strong correlation between ice cream sales and shark attacks, it's unlikely that one causes the other; instead, a third variable like hot weather might be the underlying causation for both.
Correlation
A statistical measure that describes the strength and direction of a linear relationship between two quantitative variables.
Example:
A strong positive correlation between the number of hours a student studies and their exam score suggests that more study hours tend to be associated with higher scores.
Explanatory variable
The variable that is thought to explain or predict changes in the response variable. It is typically plotted on the x-axis.
Example:
In a study examining how the amount of fertilizer affects crop yield, the amount of fertilizer is the explanatory variable.
Extrapolation
Using the least squares regression line to make predictions for explanatory variable values that are outside the range of the original data used to create the line.
Example:
Predicting the average lifespan of a human based on a regression model built only from data on children aged 1 to 10 years old would be a dangerous extrapolation.
Interpretation in Context
Explaining the meaning of statistical results, such as the slope or y-intercept of a regression line, using the specific variables and units of the problem.
Example:
When asked to interpret the slope of a regression line for fertilizer and corn yield, stating 'for every additional kilogram of fertilizer per hectare, the predicted corn yield increases by 0.2 tons per hectare' is an example of interpretation in context.
Least squares regression line (LSRL)
The line that best describes the linear relationship between two variables by minimizing the sum of the squared vertical distances between the actual data points and the line.
Example:
A scientist calculated the least squares regression line to model the relationship between the number of hours a plant is exposed to sunlight and its growth in centimeters.
Predicted value (ŷ)
The value of the response variable estimated by the least squares regression line for a given explanatory variable. It is denoted by 'y-hat'.
Example:
If the LSRL for study hours and exam scores predicts a student studying 8 hours will score 92, then 92 is the predicted value for that student's exam score.
Response variable
The variable that measures an outcome or result of a study. It is typically plotted on the y-axis.
Example:
In a study examining how the amount of fertilizer affects crop yield, the crop yield is the response variable.
Slope (b)
The rate of change in the predicted response variable for every one-unit increase in the explanatory variable.
Example:
If the slope of a regression line for car weight (x) and fuel efficiency (y) is -0.5 miles per gallon per 100 pounds, it means for every additional 100 pounds, the car's fuel efficiency is predicted to decrease by 0.5 mpg.
Y-intercept (a)
The value of the response variable when the explanatory variable is zero, representing where the regression line crosses the y-axis.
Example:
In a regression of daily ice cream sales (y) versus temperature (x), a y-intercept of 100 in sales (though this might be an unrealistic prediction).