Hypothesis Testing Fundamentals
Hypothesis: A statement about a population parameter.
Null Hypothesis (H₀): A statement of no effect or no difference; it is assumed to be true until evidence indicates otherwise.
Alternative Hypothesis (H₁ or Hₐ): A statement that contradicts the null hypothesis; it represents what we are trying to find evidence for.
Types of Tests:
- One-Tailed (Right-Tailed): H₁: μ > value
- One-Tailed (Left-Tailed): H₁: μ < value
- Two-Tailed: H₁: μ ≠ value
|
Type I Error (α): Rejecting H₀ when it is actually true (False Positive).
Type II Error (β): Failing to reject H₀ when it is actually false (False Negative).
Significance Level (α): The probability of making a Type I error.
Power (1-β): The probability of correctly rejecting H₀ when it is false.
|
- State the Hypotheses: Define H₀ and H₁.
- Determine the Test Statistic: Choose the appropriate test statistic (z, t, F, etc.).
- Set the Significance Level: Determine α (e.g., 0.05, 0.01).
- Calculate the Test Statistic: Compute the value of the test statistic from the sample data.
- Determine the p-value or Critical Value:
- p-value: The probability of observing a test statistic as extreme as, or more extreme than, the one computed if H₀ is true.
- Critical Value: The value(s) that define the rejection region.
- Make a Decision:
- p-value Method: If p-value ≤ α, reject H₀.
- Critical Value Method: If the test statistic falls in the rejection region, reject H₀.
- State the Conclusion: Interpret the decision in the context of the problem.
|
Z-Test
- Population standard deviation (σ) is known.
- Sample size is large (n ≥ 30).
- Used for testing hypotheses about a single mean or comparing two means.
Test Statistic:
z = \frac{\bar{x} - \mu}{\frac{\sigma}{\sqrt{n}}}
|
T-Test
- Population standard deviation (σ) is unknown.
- Sample size is small (n < 30).
- Used for testing hypotheses about a single mean or comparing two means.
Test Statistic:
t = \frac{\bar{x} - \mu}{\frac{s}{\sqrt{n}}}
|
Correlation and Regression
Purpose: To measure the strength and direction of the linear relationship between two variables.
Pearson Correlation Coefficient (r):
r = \frac{n(\sum xy) - (\sum x)(\sum y)}{\sqrt{[n(\sum x^2) - (\sum x)^2][n(\sum y^2) - (\sum y)^2]}}
Interpretation:
- -1 ≤ r ≤ 1
- r > 0: Positive correlation
- r < 0: Negative correlation
- r ≈ 0: No correlation
- |r| close to 1: Strong correlation
|
Example: Calculating correlation between non-member price (x) and member price (y) for n = 8 pairs.
r ≈ 0.857 indicates a strong positive correlation. As non-member price increases, member price tends to increase as well.
|
Purpose: To model the relationship between two variables and predict the value of one variable based on the other.
Regression Line Equation:
\hat{y} = a + bx
Where:
- \hat{y} is the predicted value of the dependent variable.
- x is the independent variable.
- a is the y-intercept.
- b is the slope.
|
Formulas for Slope (b) and Y-Intercept (a):
b = \frac{n(\sum xy) - (\sum x)(\sum y)}{n(\sum x^2) - (\sum x)^2}
a = \bar{y} - b\bar{x}
|
Example: Predicting number of mistakes (y) based on hours without sleep (x).
\hat{y} = -2.769 + 0.288x
For x = 40 hours:
\hat{y} = -2.769 + 0.288(40) ≈ 8.751 ≈ 9 mistakes.
|
Visual Representation: A scatter plot is a graph that displays the relationship between two variables.
Types of Relationships:
- Positive Linear: As x increases, y increases.
- Negative Linear: As x increases, y decreases.
- No Relationship: Points are scattered randomly.
- Nonlinear: Points follow a curved pattern.
|
Example: Plotting age (x) vs. income (y) data points on a scatter plot can reveal a positive linear relationship, indicating that income tends to increase with age.
|
Analysis of Variance (ANOVA)
Purpose: To test if there is a significant difference between the means of three or more independent groups.
Hypotheses:
- H₀: μ₁ = μ₂ = μ₃ = … = μk (all means are equal)
- H₁: At least one mean is different
Test Statistic: F = MSbetween / MSwithin
Where:
- MSbetween is the mean square between groups.
- MSwithin is the mean square within groups.
|
Degrees of Freedom:
- dfNumerator = k - 1 (k is the number of groups)
- dfDenominator = N - k (N is the total number of observations)
F-Distribution:
- Always right-tailed.
- Critical value is found using α, dfNumerator, and dfDenominator.
|
Sums of Squares:
- SSTotal = ∑x² - (∑x)² / N
- SSBetween = ∑ni (x̄i - X̄GM)²
- SSWithin = SSTotal - SSBetween
|
- Calculate the Group Sums, Means, and Total Mean.
- Calculate the Sum of Squares Total (SSTotal).
- Calculate the Sum of Squares Between (SSBetween).
- Calculate the Sum of Squares Within (SSWithin).
- Calculate the Mean Squares:
- MSBetween = SSBetween / dfNumerator
- MSWithin = SSWithin / dfDenominator
- Calculate the F statistic: F = MSBetween / MSWithin
- Find the Critical Value: Using α, dfNumerator, and dfDenominator.
- Make a Decision: If F > Fcrit, reject H₀.
- State the Conclusion.
|
Example: Testing if average number of students in an English course differs by time of day (Morning, Afternoon, Evening) at α = 0.05.
F ≈ 4.776, Fcrit = 3.89. Since 4.776 > 3.89, reject H₀. There is sufficient evidence to conclude that mean attendance differs by time of day.
|