Exploring Two–Variable Data
What type of sampling is used when every individual has an equal chance to be selected?
Stratified sampling
Cluster sampling
Simple random sampling
Multi-stage sampling
Why should caution be exercised when interpreting (the coefficient of determination) as evidence for causation based on data collected through observational studies?
Observational studies do not control for lurking variables which could influence both explanatory and response variables, potentially inflating without causal linkages.
Correlation coefficients derived from observational data typically underestimate true cause-and-effect relationships leading to deflated values even when causation exists.
The computation method for inherently includes confounding factors that always imply direct causation when high values are observed in observational studies.
Controlled experiments do not utilize because it's only applicable in non-experimental contexts like observational studies where causation can be inferred directly from it.
What kind of pattern in a residual plot would suggest that a linear model may not be appropriate for the data?
A clear curved pattern.
Points equally distributed above and below the x-axis.
No distinct patterns or trends in residuals across different values of x.
A random scatter around zero.
In evaluating regression models where higher-degree polynomials have been included due to curvature in data, what significance test could indicate whether including up to cubic term improves fit significantly over just quadratic?
An r-squared comparison alone determines necessity cubic inclusion without further testing.
Linear correlation coefficient increases sufficiently when adding cubic term compared prior version.
A partial F-test comparing full cubic model against reduced quadratic model could show improvement in fit.
Simple t-tests on coefficients for individual cubic terms would suffice independent assessment context.
Which pattern in a residual plot would suggest adding higher-order terms (like quadratic or cubic) to create a better fitting statistical model?
A completely random pattern showing no identifiable trend or structure.
Clear clustering around certain values on the x-axis without any apparent curve.
A consistent horizontal banding suggesting perfectly equal variance among errors.
A curved pattern indicating that relationship between variables may be non-linear.
When fitting a nonlinear model, which aspect should be assessed for inferring causation between predictor and response variable?
Assuming direct cause-and-effect based solely on model fit.
Looking for confounding variables that could explain any observed association.
Accepting causation because high coefficient determination ().
Relying purely on p-values without considering practical significance.
When analyzing residuals to check for linearity, what feature suggests a departure from linearity?
Residuals spread equally above and below the horizontal axis at all values of x.
Randomly distributed residuals around the horizontal axis without any apparent pattern.
Constant variance of residuals as x increases.
A systematic pattern in the residual plot.

How are we doing?
Give us your feedback and let us know how we can improve
Why is it important to consider effect size along with sample size when designing a study to investigate linear relationships?
Small sample sizes might not detect small but meaningful effects, while large sample sizes could find significant results for effects that aren't truly important.
Effect size has no bearing on study design as sample size alone can ensure valid statistical conclusions are drawn.
Focusing solely on effect size while ignoring sample size is recommended since it is the only measure needed to prove a strong relationship.
Large sample sizes always increase the chance of finding true positive relationships, even for true negative or weak effect sizes.
When examining a scatterplot of residuals that exhibits a clear pattern, which of the following is an appropriate conclusion about the model's fit?
The lack of pattern in the residuals confirms a good fit of the linear model.
The model does not adequately capture the relationship between the variables.
A transformation of variables is unnecessary since patterns in residuals are common.
Adding more data points will eliminate any patterns seen in the residuals.
What are outlier points?
Points that improve the overall fit of the regression model.
Points with x-values that are far away from the rest of the data.
Points with low-magnitude residuals.
Points with y-values that are far away from the rest of the data.