Glossary
10% Condition
A specific aspect of the independence condition, stating that the sample size should be no more than 10% of the population size to ensure that observations can be treated as independent without replacement.
Example:
If you sample 50 students from a school of 1000, the 10% condition (50 ≤ 0.10 * 1000) is met, allowing you to assume independence.
Central Limit Theorem (CLT)
A fundamental theorem stating that for a sufficiently large sample size (typically n ≥ 30), the sampling distribution of the sample mean or slope will be approximately normal, regardless of the population distribution.
Example:
Even if individual student study times are skewed, the Central Limit Theorem ensures that the distribution of sample mean study times will be approximately normal for large samples.
Confidence Intervals
A range of values likely to contain the true population parameter, estimated from sample data, providing a measure of the precision and uncertainty of the estimate.
Example:
A 95% confidence interval for the average height of adult males might be (68 inches, 72 inches), suggesting the true average height is likely within this range.
Constant Standard Deviation of y Condition
A condition for linear regression inference requiring that the spread of the residuals is roughly the same across all x-values, also checked by examining the residual plot.
Example:
If the residual plot shows a fanning-out pattern, the constant standard deviation of y condition is violated, suggesting the model's predictive power varies.
Degrees of Freedom
A value related to the sample size that determines the specific shape of the t-distribution, calculated as n-2 for linear regression inference.
Example:
For a sample of 25 adults in a regression analysis, the degrees of freedom would be 25 - 2 = 23.
Independence Condition
A condition for inference requiring that observations in the sample are independent of each other, typically met by random sampling or a randomized experiment.
Example:
If students are randomly selected for a study, the independence condition is likely met, ensuring one student's data doesn't influence another's.
LinRegTInt
A calculator function specifically designed to construct a confidence interval for the slope of a linear regression line, streamlining the calculation process.
Example:
Instead of manually calculating the t-score and margin of error, you can use LinRegTInt on your calculator to quickly find the confidence interval for the slope.
Linear Condition
A condition for linear regression inference requiring that the true relationship between the independent and dependent variables is linear, often checked by examining the residual plot.
Example:
To check the linear condition, you would examine the residual plot for any curved patterns, which would indicate a non-linear relationship.
Margin of Error
The 'buffer zone' added to and subtracted from the point estimate to create a confidence interval, accounting for sampling variability and the desired confidence level.
Example:
If a survey reports a 50% approval rating with a margin of error of ±3%, the true approval rating is likely between 47% and 53%.
Normal Condition
A condition for linear regression inference requiring that the distribution of the residuals (or the y-values for each x) is approximately normal, especially important for smaller sample sizes.
Example:
To check the normal condition, you might look at a histogram or normal probability plot of the residuals for approximate symmetry and bell shape.
Point Estimate
A single value calculated from sample data that serves as the best guess for an unknown population parameter, forming the center of a confidence interval.
Example:
If a sample of students shows an average study time of 3 hours, this 3 hours is the point estimate for the average study time of all students.
Population Parameter
A numerical characteristic of an entire population that researchers aim to estimate using sample data.
Example:
The true average income of all households in a city is a population parameter that researchers often try to estimate using samples.
Residuals
The differences between the observed y-values and the y-values predicted by the regression line (observed - predicted), representing the error in the model's prediction.
Example:
If a student scored 85 on a test, but the regression line predicted 80, their residual would be 5.
Slope of the Regression Line
In linear regression, this represents the estimated change in the dependent variable for a one-unit increase in the independent variable.
Example:
If the slope of the regression line for study hours and exam scores is 5, it means for every additional hour studied, the exam score is predicted to increase by 5 points.
Standard Deviation of Residuals (s)
A measure of the typical distance between the observed y-values and the y-values predicted by the regression line, indicating how well the line fits the data.
Example:
If the standard deviation of residuals is small, it means the data points cluster closely around the regression line, indicating a good fit.
Standard Error
A measure of the variability or precision of a sample statistic, such as the sample slope, indicating how much sample statistics are expected to vary from the true population parameter.
Example:
A small standard error of the slope suggests that the sample slope is a more precise estimate of the true population slope.
T-score
A critical value from the t-distribution used in confidence intervals and hypothesis tests when the population standard deviation is unknown, determined by the confidence level and degrees of freedom.
Example:
To construct a 95% confidence interval for a small sample, you'd look up the appropriate t-score based on your degrees of freedom.