Glossary
Clustering
The phenomenon where data points in a scatterplot form distinct groups, often indicating underlying categories or segments within the data.
Example:
Observing Clustering in a scatterplot of income versus education level might reveal distinct groups for different educational attainment levels.
Clusters
Distinct groups or concentrations of points within a scatterplot that suggest different categories or subgroups in the data.
Example:
A scatterplot of student heights and weights might show two distinct Clusters, one for middle schoolers and one for high schoolers.
Confidence intervals
A range of values within which the true value of a prediction or parameter is expected to fall, typically with a certain level of probability.
Example:
When predicting a student's future GPA, providing a Confidence interval like '3.0 to 3.4' gives a more realistic estimate than a single point prediction.
Correlation DOES NOT equal causation
A critical principle stating that just because two variables are related or move together, it does not mean one variable causes the other.
Example:
While ice cream sales and drowning incidents both increase in summer, Correlation DOES NOT equal causation; neither causes the other.
Extrapolating
Making predictions using the line of best fit for x-values that are outside the range of the observed data.
Example:
Using a line of best fit from data on ages 5-10 to predict the height of a 50-year-old would be Extrapolating and potentially unreliable.
Form (of relationship)
Refers to the overall shape or pattern that the points in a scatterplot tend to follow.
Example:
The Form of the relationship between the age of a car and its resale value is typically nonlinear, showing a curve.
Gaps
Areas in a scatterplot where there are no data points, indicating a lack of observations within a certain range of the variables.
Example:
A scatterplot of house prices versus square footage might show Gaps if no houses of a certain size were sold in the dataset.
Line of best fit (trendline)
A straight line drawn through the center of a scatterplot's data points that best represents the overall linear relationship between the variables.
Example:
After plotting student data, drawing a Line of best fit helps predict a student's potential score based on their study time.
Linear (relationship)
A type of relationship where the points in a scatterplot tend to follow a straight line pattern.
Example:
The relationship between the number of items sold and the total revenue often shows a Linear pattern.
Moderate Correlation
A correlation that shows a general trend with some noticeable variation, meaning the points are somewhat spread out but still follow a direction.
Example:
A scatterplot of daily steps taken and reported energy levels might show a Moderate Correlation, indicating a general trend but with individual differences.
Negative Correlation
A specific type of negative relationship where points generally move downwards from left to right, indicating an inverse association.
Example:
As the number of hours spent playing video games increases, a student's sleep duration tends to decrease, demonstrating a Negative Correlation.
Negative Relationship
A pattern in a scatterplot where as the values of one variable increase, the values of the other variable tend to decrease.
Example:
Observing that as the number of hours watching TV increases, a student's GPA tends to decrease, indicates a Negative Relationship.
No Correlation
A situation where points on a scatterplot are scattered randomly with no discernible direction or pattern, suggesting no linear relationship.
Example:
There is typically No Correlation between the number of pets a person owns and their favorite type of music.
No Relationship
A pattern in a scatterplot where there is no clear trend or connection between the two variables; points appear randomly scattered.
Example:
A scatterplot comparing a person's favorite color to their height would likely show No Relationship.
Nonlinear (relationship)
A type of relationship where the points in a scatterplot follow a curved pattern rather than a straight line.
Example:
The growth of bacteria over time often exhibits a Nonlinear (exponential) relationship.
Outliers
Individual data points in a scatterplot that lie far away from the general pattern of the other points.
Example:
On a scatterplot of student test scores versus study hours, a student who studied very little but scored exceptionally high would be an Outlier.
Perfect Linearity
A rare scenario where all data points in a scatterplot fall exactly on a straight line, indicating an exact linear relationship.
Example:
If you plot the circumference of a circle against its diameter, you would observe Perfect Linearity.
Positive Correlation
A specific type of positive relationship where points generally move upwards from left to right, indicating a direct association.
Example:
The more time a student spends on practice problems, the higher their test scores tend to be, showing a Positive Correlation.
Positive Relationship
A pattern in a scatterplot where as the values of one variable increase, the values of the other variable also tend to increase.
Example:
A scatterplot showing that more hours spent exercising generally leads to more calories burned illustrates a Positive Relationship.
Quantitative variables
Variables that can be measured numerically, allowing for mathematical operations and meaningful comparisons.
Example:
Height, weight, age, and test scores are all examples of quantitative variables that can be plotted on a scatterplot.
Residual analysis
The process of examining the differences between the observed y-values and the y-values predicted by the line of best fit to assess the model's accuracy.
Example:
Performing Residual analysis can help determine if a linear model is appropriate for the data or if a different type of relationship exists.
Slope
The 'm' value in the equation y = mx + b, representing the rate of change in the y-variable for every one-unit increase in the x-variable.
Example:
If the Slope of a line of best fit for hours studied vs. test scores is 5, it means for every additional hour studied, the test score is predicted to increase by 5 points.
Strength (of correlation)
Describes how closely the points in a scatterplot follow a particular trend or pattern.
Example:
If all data points fall almost perfectly on a straight line, the Strength of the correlation is very high.
Strong (correlation)
Indicates that the points in a scatterplot are tightly clustered around a clear trend, suggesting a consistent and predictable relationship.
Example:
If a scatterplot of study hours and exam scores shows points very close to a straight line, it indicates a Strong correlation.
Weak (correlation)
Indicates that the points in a scatterplot are widely spread out around a trend, suggesting a loose or inconsistent relationship.
Example:
A scatterplot showing a Weak correlation between daily coffee intake and hours of sleep would have points very dispersed.
Y-intercept
The 'b' value in the equation y = mx + b, representing the predicted value of the y-variable when the x-variable is zero.
Example:
In a line of best fit for temperature vs. ice cream sales, the Y-intercept would represent the predicted ice cream sales when the temperature is 0 degrees.
x-axis
The horizontal axis on a scatterplot, typically representing the independent variable or the first quantitative variable being observed.
Example:
When plotting hours studied versus exam scores, the number of hours studied would usually be displayed on the x-axis.
y-axis
The vertical axis on a scatterplot, typically representing the dependent variable or the second quantitative variable being observed.
Example:
In a scatterplot showing temperature and ice cream sales, the amount of ice cream sold would be plotted on the y-axis.