Exploring Two–Variable Data
Why should caution be exercised when interpreting (the coefficient of determination) as evidence for causation based on data collected through observational studies?
Observational studies do not control for lurking variables which could influence both explanatory and response variables, potentially inflating without causal linkages.
Correlation coefficients derived from observational data typically underestimate true cause-and-effect relationships leading to deflated values even when causation exists.
The computation method for inherently includes confounding factors that always imply direct causation when high values are observed in observational studies.
Controlled experiments do not utilize because it's only applicable in non-experimental contexts like observational studies where causation can be inferred directly from it.
Which pattern in a residual plot would suggest adding higher-order terms (like quadratic or cubic) to create a better fitting statistical model?
A completely random pattern showing no identifiable trend or structure.
Clear clustering around certain values on the x-axis without any apparent curve.
A consistent horizontal banding suggesting perfectly equal variance among errors.
A curved pattern indicating that relationship between variables may be non-linear.
When fitting a nonlinear model, which aspect should be assessed for inferring causation between predictor and response variable?
Assuming direct cause-and-effect based solely on model fit.
Looking for confounding variables that could explain any observed association.
Accepting causation because high coefficient determination ().
Relying purely on p-values without considering practical significance.
What kind of pattern in a residual plot would suggest that a linear model may not be appropriate for the data?
A clear curved pattern.
Points equally distributed above and below the x-axis.
No distinct patterns or trends in residuals across different values of x.
A random scatter around zero.
In evaluating regression models where higher-degree polynomials have been included due to curvature in data, what significance test could indicate whether including up to cubic term improves fit significantly over just quadratic?
An r-squared comparison alone determines necessity cubic inclusion without further testing.
Linear correlation coefficient increases sufficiently when adding cubic term compared prior version.
A partial F-test comparing full cubic model against reduced quadratic model could show improvement in fit.
Simple t-tests on coefficients for individual cubic terms would suffice independent assessment context.
What type of sampling is used when every individual has an equal chance to be selected?
Stratified sampling
Cluster sampling
Simple random sampling
Multi-stage sampling
Which type of point can significantly change the slope of a regression line?
High-leverage point
Low-leverage point
Outlier point
Influential point

How are we doing?
Give us your feedback and let us know how we can improve
When selecting a transformation for a regression model, what should be considered?
The number of data points
The y-intercept of the model
Residual plots and value
Magnitude of influential points
Why is it important to check for departures from linearity when analyzing bivariate quantitative data?
To confirm that all outliers have been removed from the dataset.
To calculate new values for x-variable coefficients.
To ensure that linear models are appropriate for making predictions.
To increase the correlation coefficient to improve model accuracy.
When increasing the sample size in a linear regression analysis, what is the most likely impact on the width of a confidence interval for the slope of the regression line?
The distribution shape changes.
The width increases.
Variability increases.
The width decreases.