Glossary
Chi-Square Goodness of Fit Test
A hypothesis test used to determine if an observed distribution of a single categorical variable matches a hypothesized or expected distribution.
Example:
To check if the distribution of M&M colors in a bag matches the proportions claimed by the company, you would use a Chi-Square Goodness of Fit Test.
Chi-Square Test for Homogeneity
A hypothesis test used to determine if the distribution of a single categorical variable is the same across multiple populations or groups.
Example:
To see if the proportion of students who pass a standardized test is the same across three different high schools, a Chi-Square Test for Homogeneity is used.
Chi-Square Test for Independence
A hypothesis test used to determine if there is a statistically significant association or relationship between two categorical variables.
Example:
To investigate if there's a relationship between a person's preferred social media platform and their age group, you would use a Chi-Square Test for Independence.
Conditions for Inference
Specific criteria (Randomness, Independence, Normality/Large Sample) that must be met for the results of an inference procedure to be valid.
Example:
Before performing any t-test, you must check the conditions for inference, ensuring your sample is random and the data distribution is approximately normal or the sample size is large enough.
Confidence Interval
A range of plausible values for an unknown population parameter, constructed with a specified level of confidence.
Example:
A 95% confidence interval for the mean height of adult males might be (68 inches, 70 inches), meaning we are 95% confident the true mean height falls within this range.
Context
Referring to the specific real-world scenario or problem being analyzed, requiring statistical conclusions to be stated in terms of the problem's variables and units.
Example:
When interpreting a confidence interval for average tree height, you must state your conclusion in the context of 'trees' and 'height in meters,' not just numbers.
Degrees of Freedom (df)
A value that specifies the number of independent pieces of information used to estimate a parameter or calculate a statistic, often related to sample size.
Example:
For a one-sample t-test with 25 observations, the degrees of freedom (df) would be 24 (n-1).
Estimating a Parameter
Using sample data to construct a confidence interval that provides a plausible range of values for an unknown population parameter.
Example:
A pollster might be estimating a parameter by creating a confidence interval for the true proportion of voters who support a particular candidate.
Hypotheses ($H_0$, $H_a$)
Statements about a population parameter that are tested in a hypothesis test; $H_0$ is the null hypothesis (no effect/difference), and $H_a$ is the alternative hypothesis (there is an effect/difference).
Example:
For a test of a new drug, the hypotheses might be : the drug has no effect on blood pressure, and : the drug lowers blood pressure.
Inference Procedure
A statistical method used to draw conclusions or make predictions about a population based on sample data.
Example:
Choosing the correct inference procedure is crucial for determining if a new teaching method significantly improves test scores.
Linear Regression T-Interval
A confidence interval used to estimate the true slope of the population regression line, which describes the linear relationship between two quantitative variables.
Example:
After collecting data on study hours and exam scores, you could construct a Linear Regression T-Interval to estimate the true increase in score for each additional hour studied.
Linear Regression T-Test
A hypothesis test used to determine if there is a statistically significant linear relationship between two quantitative variables, specifically testing if the slope of the true regression line is zero.
Example:
To determine if the amount of fertilizer used significantly predicts crop yield, you would perform a Linear Regression T-Test on the slope.
Matched Pairs T-Test
A hypothesis test used when data are collected in pairs, such as before-and-after measurements on the same subjects, to analyze the mean difference.
Example:
To assess if a new diet plan causes a significant change in weight, you would use a Matched Pairs T-Test on participants' weights before and after the diet.
Number of Groups
Indicates how many distinct populations or samples are being compared in an inference procedure.
Example:
When comparing the average commute times of urban versus suburban residents, you are dealing with two number of groups.
One Proportion Z-Interval
A confidence interval used to estimate the true proportion of a single population based on sample data.
Example:
After surveying 100 people, you could construct a One Proportion Z-Interval to estimate the percentage of the population that owns a pet.
One Proportion Z-Test
A hypothesis test used to determine if a sample proportion is significantly different from a hypothesized population proportion.
Example:
To see if the proportion of students who prefer online learning is truly 60%, you would use a One Proportion Z-Test.
One Sample T-Interval
A confidence interval used to estimate the true mean of a single population when the population standard deviation is unknown.
Example:
To estimate the average weight of a certain species of fish in a lake, you could catch a sample and construct a One Sample T-Interval.
One Sample T-Test
A hypothesis test used to determine if a sample mean is significantly different from a hypothesized population mean when the population standard deviation is unknown.
Example:
If a school wants to know if their students' average SAT score is different from the national average of 1000, they would perform a One Sample T-Test.
Point Estimate
A single value calculated from sample data that is used to estimate an unknown population parameter.
Example:
The sample mean of 75 is a point estimate for the true average score of all students.
Reject the Null Hypothesis
The decision made in a hypothesis test when the p-value is less than the significance level, indicating sufficient evidence against the null hypothesis.
Example:
If the p-value for a new drug's effectiveness is very low, we would reject the null hypothesis, concluding the drug is effective.
Significance Level
A predetermined threshold (alpha, often 0.05) used in hypothesis testing to decide whether to reject the null hypothesis.
Example:
If the significance level is set at 0.05, and your p-value is 0.02, you would reject the null hypothesis.
Slope
In a linear regression model, the slope represents the estimated change in the dependent variable for every one-unit increase in the independent variable.
Example:
If the slope of the regression line for house size vs. price is 100.
Standard Error
A measure of the variability or precision of a sample statistic (like a mean or proportion) as an estimate of a population parameter.
Example:
A small standard error for the sample mean indicates that the sample mean is likely a more precise estimate of the population mean.
Testing a Claim
A hypothesis test conducted to determine if there is enough statistical evidence to support or reject a specific statement about a population parameter.
Example:
A company might be testing a claim that their new battery lasts longer than 10 hours on average.
Two Proportion Z-Interval
A confidence interval used to estimate the difference between two population proportions.
Example:
Researchers might use a Two Proportion Z-Interval to estimate the difference in recovery rates between patients receiving a new drug versus a placebo.
Two Proportion Z-Test
A hypothesis test used to compare two population proportions to see if they are significantly different from each other.
Example:
To compare the success rates of two different marketing campaigns, a Two Proportion Z-Test would be appropriate.
Two Sample T-Interval
A confidence interval used to estimate the difference between the means of two independent populations.
Example:
A researcher might construct a Two Sample T-Interval to estimate the difference in average plant growth under two different fertilizer types.
Two Sample T-Test
A hypothesis test used to compare the means of two independent populations to see if they are significantly different.
Example:
To determine if there's a significant difference in average test scores between students taught by two different teachers, a Two Sample T-Test is appropriate.
Variable Type
Refers to whether the data collected is categorical (qualitative, e.g., gender) or quantitative (numerical, e.g., height).
Example:
Before analyzing data, you must identify the variable type; for instance, 'favorite color' is categorical, while 'number of siblings' is quantitative.
p-value
The probability of observing sample results as extreme as, or more extreme than, the observed results, assuming the null hypothesis is true.
Example:
If a p-value is 0.03, it means there's a 3% chance of seeing our results if the null hypothesis were true, suggesting the results are unlikely by chance.
t-score
A standardized test statistic used in t-procedures when the population standard deviation is unknown, indicating how many standard errors a sample statistic is from the hypothesized parameter.
Example:
When constructing a confidence interval for a mean, you use a t-score from the t-distribution based on your desired confidence level and degrees of freedom.