Collecting Data
How does clustering differ from stratification in terms of selecting samples?
Each individual divides into predefined categories
Involves several steps
A single-stage process
Entire groups (clusters) are chosen at once instead of individuals
Given that a recent survey found owners of electric cars earn above-average incomes, why should this not automatically lead us into assuming electric cars cause increased earnings?
Individuals interested in technology advancements naturally gravitate towards owning electric vehicles, resulting in greater income opportunities.
People earning above-average incomes feel financial pressure and demonstrate wealth through purchasing expensive items like electric cars.
Lurking variables such as education level and environmental consciousness may correlate both with choosing electric vehicles and earning higher incomes.
Electric car ownership boosts social status, therefore indirectly increases chances of professional advancement and career growth.
What type of bias occurs when certain groups are underrepresented in a survey because they are less likely to be chosen for the sample?
Response bias
Nonresponse bias
Undercoverage bias
Voluntary response bias
Which of the following best describes a simple random sample?
Only individuals meeting certain criteria are chosen.
Every individual has an equal chance of being selected.
Samples are taken at regular intervals from the population.
Individuals are chosen based on convenience.
Which measure indicates the most frequent value in a data set?
Mean absolute deviation
Interquartile range (IQR)
Standard error
Mode
In assessing whether a complex statistical model is overfitting compared to a simpler one when predicting new data points, what key evidence should analysts look for?
The simpler model requires fewer parameters yet predicts training data with slightly better accuracy than complex models.
The complex and simpler models yield more accurate predictions as more variables are introduced into each respective analysis framework.
Both models achieve identical levels of accuracy when applied to large-scale validation datasets after training completion.
The complex model exhibits high accuracy on training data but significantly less accuracy on validation or testing data sets compared to simple models.
When analyzing if there is any significant linear relationship between SAT scores and first-year college GPA, which statistical tool assesses both strength and direction while also considering outliers?
Scatterplot with added calculation of Pearson's correlation coefficient () along with identification of potential outliers or influential points.
Calculation of Spearman's rank correlation coefficient without visual inspection or consideration of possible outliers or influential points.
Creation of side-by-side boxplots comparing SAT scores against discrete ranges or quartiles defined by college GPA levels.
Computing a regression line exclusively based on median values rather than individual data points across SAT scores and GPAs.

How are we doing?
Give us your feedback and let us know how we can improve
When a researcher uses a linear regression model to predict outcomes, which of the following is not an assumption that needs checking for the model to be considered reliable?
The residuals should be approximately normally distributed for each value of the independent variable.
There must be linearity between the independent and dependent variables.
The sum of residuals must equal zero.
The variance of residual terms should be constant across all levels of the independent variable.
When assessing whether observational studies produce valid conclusions regarding causality, which factor undermines this goal most directly?
Application of blinded designs where subjects don't know if they're receiving treatment or placebo
Implementation of control groups used explicitly designed compare different conditions within study context effectively
Use of technology like computer algorithms that randomly assign treatment groups among study participants
Presence of confounding variables linked both with treatment exposure and outcome measures being studied
If a researcher uses a linear regression model to predict outcomes on new datasets and notices that the residuals are consistently patterned instead of randomly scattered, what is this an indication of?
Every point in the dataset lies exactly on the regression line.
There is no correlation between the independent and dependent variables.
The variance of the error terms is consistent across all levels of the independent variable.
The linear model may not be appropriate for the data.