Exploring Two–Variable Data
What does the least squares regression line minimize?
The sum of squared residuals
The range of residuals
The product of squared residuals and regression coefficients
The sum of absolute deviations from the mean
What statistical measure can be used to evaluate whether there is evidence for a significant linear relationship between two quantitative variables?
Pearson's correlation coefficient (r)
P-value from an ANOVA test
Exponential growth rate
Chi-square statistic
In a study examining the relationship between ice cream sales and drowning incidents, researchers found a strong positive correlation; which of the following is the best explanation for this relationship?
The increase in temperature is a confounding variable that affects both ice cream sales and drowning incidents.
There is no plausible explanation; the correlation must be coincidental and not indicative of any relationship.
Ice cream sales cause an increase in drowning incidents due to people swimming shortly after eating.
Drowning incidents lead to an increase in ice cream sales as a coping mechanism for stress.
In assessing whether to use least squares regression, why is it important to examine a scatterplot before calculating further statistics?
To confirm if there's a linear relationship justifying regression analysis.
To calculate the correlation coefficient visually without statistical software.
To determine if any transformations are needed to achieve normality.
To identify any possible causation between variables depicted in data points.
What is the term for the line that minimizes the sum of the squared differences between the data points and the line itself in a scatterplot?
Mode-mode line.
Average deviation line.
Least squares regression line.
Median-median line.
Which statistical measure best describes how well a least squares regression model fits with actual data?
Coefficient of determination ().
Pearson's correlation coefficient ().
Standard deviation of x-values ().
Mean absolute deviation (MAD).
When testing assumptions necessary for constructing confidence intervals around predicted Y-values using least squares regression lines, observation(s) with high leverage typically require extra scrutiny because they tend to have what impact on these interval estimates?
Despite being identified as high-leverage observations according solely upon their predictor variable value magnitudes might not necessarily affect fitted model robustness unless accompanied associated large residuals too.
They disproportionately influence slope and intercept estimates thus potentially distorting corresponding confidence intervals' width and location.
Discovering higher Cook's distances suggests reconsideration regarding those points' inclusion due primarily because they exert undue influence over overall model parameters.
High-leverage points generally necessitate additional investigation since presence alone does indicate significant alteration expected outcome distributions even without notable contribution toward total sum square errors (SSE).

How are we doing?
Give us your feedback and let us know how we can improve
If the correlation coefficient (r) for a set of data is 0.8, what would be the approximate value of the determination coefficient (r^2)?
-0.8
0.4
0.64
-0.64
Is it possible for two different sets of bivariate data to have identical least squares regression lines and yet differ substantially in their plots?
No, as long as their correlation coefficients are equal.
Yes, but only if one dataset contains more points than another does.
Yes, if they have different spread or outliers but similar overall trends.
No, because identical lines mean they must have identical plots.
Which scenario would most likely result in a least squares regression line with greater slope volatility given changes to input data points?
Data distributed evenly along a clear linear trend without significant outliers.
A large dataset where outliers have minimal impact on slope determination.
A small dataset where each individual point heavily influences slope calculation.
Multiple datasets yielding similar slopes when performing separate regressions analyses.