Glossary
Bivariate Data
Data that involves two different quantitative variables, typically analyzed to understand the relationship or association between them.
Example:
To see if there's a connection between the number of hours students spend on social media and their GPA, you would collect bivariate data.
Clusters
Distinct groups of data points that are close together within a scatterplot, suggesting different subgroups or patterns within the data.
Example:
A scatterplot of commute time versus distance might show two clusters: one for urban commuters and another for suburban commuters, each with different travel patterns.
Direction (of a scatterplot)
Indicates the general trend of the relationship between the variables as you move from left to right, either positive (increasing) or negative (decreasing).
Example:
A scatterplot showing that as daily temperatures increase, ice cream sales also tend to increase, exhibits a positive direction.
Explanatory Variable (x)
The variable that is thought to influence or cause a change in another variable; it is plotted on the horizontal (x) axis of a scatterplot.
Example:
In a study examining how fertilizer amount affects plant growth, the amount of fertilizer used would be the explanatory variable.
Form (of a scatterplot)
Describes the overall shape or pattern of the data points in a scatterplot, typically categorized as linear or curved.
Example:
If the points on a scatterplot roughly follow a straight line, we would describe its form as linear.
High Leverage Point
A data point with an x-value that is far from the mean of the other x-values, giving it the potential to 'pull' the regression line towards itself.
Example:
If you're plotting the relationship between hours worked and salary, an executive who works an unusually high number of hours compared to everyone else would be a high leverage point.
Influential Point
A data point that, if removed, would significantly alter the slope or y-intercept of the regression line, often due to its extreme x-value or its position relative to the trend.
Example:
In a dataset of car prices versus age, an extremely old, rare classic car with a very high price could be an influential point because it might pull the regression line upwards.
Outlier
A data point that lies far away from the general pattern of the other points in a scatterplot, potentially indicating an error or an unusual observation.
Example:
If most students study between 1-5 hours for a test, but one student studies 20 hours, that student's data point might be an outlier on a scatterplot of study time vs. score.
Response Variable (y)
The variable that is measured or observed to see if it responds to changes in the explanatory variable; it is plotted on the vertical (y) axis of a scatterplot.
Example:
If you're tracking how daily exercise impacts resting heart rate, the resting heart rate would be the response variable.
Scatterplots
A graphical display used to visualize the relationship between two quantitative variables, with each point representing a pair of observed values.
Example:
A teacher might create a scatterplot to visually inspect the relationship between the number of hours students studied for a test and their scores on that test.
Strength (of a scatterplot)
Measures how closely the points in a scatterplot adhere to the identified form, ranging from strong (tightly clustered) to weak (very spread out).
Example:
If all the data points on a scatterplot fall almost perfectly on a straight line, the relationship has very strong strength.
Unusual Features (of a scatterplot)
Any patterns or individual points that deviate from the overall trend in a scatterplot, such as outliers or clusters.
Example:
When analyzing a scatterplot of house prices versus square footage, a single house that is much more expensive than others of similar size would be an unusual feature.