Chi-Square Calculator
Calculate the chi-square statistic, p-value, and degrees of freedom for a 2×2 contingency table. Instantly test independence between two categorical variables. Free statistics tool.
What Is the Chi-Square Test?
The chi-square (χ²) test is one of the most widely used statistical tests for analyzing categorical data. It answers a fundamental question: Are two categorical variables independent, or is there a real relationship between them? Unlike tests for numerical data (like t-tests), chi-square works with counts — how many items fall into each category.
Classic examples of chi-square applications:
- Does smoking status (smoker/non-smoker) relate to lung disease (yes/no)?
- Is there a relationship between training frequency (low/high) and injury rate (yes/no)?
- Does gender affect product preference (product A/product B)?
- Is a die fair? (Does each face appear approximately 1/6 of the time?)
The test works by comparing observed frequencies (what you actually counted) to expected frequencies (what you'd expect if the two variables were completely independent). If the observed counts deviate far enough from expected, you reject the hypothesis of independence and conclude the variables are related.
For a 2×2 table (two categories for each of two variables), the test has 1 degree of freedom. The resulting chi-square statistic is compared to a critical value or converted to a p-value to determine statistical significance.
The Chi-Square Formula for a 2×2 Table
Given a 2×2 contingency table with observed counts A, B, C, D:
| Category 1 | Category 2 | Row Total | |
|---|---|---|---|
| Group 1 | A | B | A + B |
| Group 2 | C | D | C + D |
| Col Total | A + C | B + D | N = A+B+C+D |
Step 1 — Calculate expected frequencies:
E_A = (A+B)(A+C)/N E_B = (A+B)(B+D)/N
E_C = (C+D)(A+C)/N E_D = (C+D)(B+D)/N
Step 2 — Calculate chi-square:
χ² = Σ [(O − E)² / E]
= (A−E_A)²/E_A + (B−E_B)²/E_B + (C−E_C)²/E_C + (D−E_D)²/E_D
Or equivalently for a 2×2 table, the shortcut formula:
χ² = N(AD − BC)² / [(A+B)(C+D)(A+C)(B+D)]
Step 3 — Degrees of freedom: For a 2×2 table: df = (rows − 1) × (cols − 1) = 1 × 1 = 1
Step 4 — Find p-value: Compare χ² to the chi-square distribution with df=1. At 95% confidence (α=0.05), the critical value is 3.841. If χ² > 3.841, the result is statistically significant — you reject independence.
Worked example: A = 50 (runners who trained high frequency, no injury), B = 30 (high freq, injury), C = 20 (low freq, no injury), D = 40 (low freq, injury). N = 140.
- χ² = 140 × (50×40 − 30×20)² / [(80)(60)(70)(70)] = 140 × (2000−600)² / 23,520,000
- = 140 × 1,960,000 / 23,520,000 = 11.67
- p-value ≈ 0.00064 — highly significant. Training frequency and injury are not independent.
Chi-Square Critical Value Table
Reference values for chi-square significance testing at 1 degree of freedom (2×2 table):
| p-value | Significance | Critical χ² (df=1) | Interpretation |
|---|---|---|---|
| 0.10 | 90% confidence | 2.706 | Suggestive trend |
| 0.05 | 95% confidence | 3.841 | Statistically significant ✓ |
| 0.01 | 99% confidence | 6.635 | Highly significant ✓✓ |
| 0.001 | 99.9% confidence | 10.828 | Very highly significant ✓✓✓ |
For larger contingency tables, degrees of freedom increase. For a 2×3 table: df = (2-1)(3-1) = 2. For a 3×3 table: df = 4. The critical value increases with degrees of freedom, so larger tables require higher chi-square values to achieve significance.
| Table Size | df | Critical χ² (p=0.05) | Critical χ² (p=0.01) |
|---|---|---|---|
| 2×2 | 1 | 3.841 | 6.635 |
| 2×3 or 3×2 | 2 | 5.991 | 9.210 |
| 3×3 | 4 | 9.488 | 13.277 |
| 4×4 | 9 | 16.919 | 21.666 |
Assumptions and When Chi-Square Is Valid
The chi-square test has several important assumptions that must be met for valid results:
- Independence of observations: Each person or item should appear in only one cell. If the same individuals are measured twice (before/after), use McNemar's test instead.
- Sufficient expected frequencies: All expected cell frequencies should be ≥ 5. If any expected cell has fewer than 5, chi-square may be unreliable. Use Fisher's Exact Test instead for small samples.
- Random sampling: Data should come from a random or representative sample, not a convenience sample of volunteers who might systematically differ from the population.
- Categorical data only: Chi-square is for counts/categories. Do not use it for continuous numerical data — use correlation, t-tests, or ANOVA instead.
Yates' continuity correction: For 2×2 tables with small samples (n < 40 or any expected frequency < 10), Yates' correction subtracts 0.5 from each |O-E| before squaring, reducing the chi-square value. This correction is controversial — some statisticians recommend Fisher's Exact Test instead. Our calculator shows both the uncorrected and Yates-corrected values.
Fisher's Exact Test: For 2×2 tables with very small samples (total N < 20, or any expected cell < 5), Fisher's Exact Test calculates the exact probability of observing the given cell counts, without relying on the chi-square approximation. It's the gold standard for small samples in medicine and biology.
Effect Size: Phi (φ) and Cramér's V
A statistically significant chi-square result tells you there's a real relationship — but not how strong it is. A very large sample can make even a tiny, practically meaningless association statistically significant. Effect size measures quantify the strength of the relationship independently of sample size.
| Measure | Formula | Range | When to Use |
|---|---|---|---|
| Phi (φ) | √(χ²/N) | 0 to 1 | 2×2 tables only |
| Cramér's V | √(χ²/(N×min(r-1,c-1))) | 0 to 1 | Any table size |
Interpretation guidelines (Cohen, 1988):
- Small effect: φ or V = 0.1 — relationship exists but is weak; practical impact may be minimal
- Medium effect: φ or V = 0.3 — moderate relationship, worth attention in applied research
- Large effect: φ or V = 0.5 — strong relationship, clearly visible in data visualization
Example: In our training frequency / injury example (χ² = 11.67, N = 140), phi = √(11.67/140) = √0.0834 = 0.29 — a medium effect size. The relationship is real and moderately strong, not just a statistical artifact of a large sample.
Chi-Square Goodness of Fit Test
Beyond testing independence between two variables, chi-square is also used to test whether a distribution of observed counts matches an expected (theoretical) distribution. This is the goodness of fit test.
Classic applications:
- Testing fairness: Roll a die 60 times. You expect each face ~10 times. Did you get close enough to claim the die is fair?
- Genetic ratios: Mendel's pea experiments tested whether observed offspring ratios (e.g., 3:1 dominant:recessive) matched the expected genetic ratios.
- Birth day distribution: Are births equally distributed across days of the week, or are certain days more common (due to scheduled C-sections and inductions)?
- Market share: Does a product's observed purchase distribution match the expected market share percentages?
Formula: χ² = Σ[(O_i − E_i)² / E_i] for each category i. Degrees of freedom = number of categories − 1 (minus any additional estimated parameters). The test follows the same chi-square distribution as the independence test.
Real-World Applications in Sports and Health Research
Chi-square tests are a workhorse of sports science and epidemiological research because so much data in these fields is categorical.
Injury risk factors: Comparing injury rates between athletes who do warm-up drills vs. those who don't. Observed: 15 injuries among 80 warm-up athletes, 28 injuries among 70 non-warm-up athletes. Chi-square tests whether this 18.75% vs 40% difference is statistically significant or could be due to chance.
Supplement efficacy surveys: Did athletes who took supplement X (yes/no) report performance improvement (yes/no)? With 200 athletes in a 2×2 table, chi-square quickly determines if any observed association exceeds what chance alone would produce.
Doping control outcomes: Analyzing whether positive test rates differ across sports or competition levels. Each sport/level becomes a column; positive/negative becomes the row variable.
Medical screening: Is a new diagnostic test positive or negative correlated with true disease status (gold-standard confirmed)? This gives sensitivity and specificity, and chi-square confirms the test performs better than chance.
Survey response analysis: Do men and women respond differently to a categorical survey question (agree/disagree/neutral)? A 2×3 chi-square test (gender × response) answers this.
In all these cases, chi-square is the right choice because the data are frequencies of categorical outcomes, not continuous measurements. It's the statistical test that asks: "given our null hypothesis that these variables are unrelated, how surprising are our data?"
Frequently Asked Questions
What is the chi-square test used for?
The chi-square test is used to determine whether there is a significant relationship between two categorical variables (test of independence) or whether observed category counts match expected counts (goodness of fit test). It works with frequency data — counts of how many items fall into each category — not continuous measurements.
What p-value is significant in a chi-square test?
The conventional threshold is p < 0.05 (95% confidence). If your p-value is below 0.05, you reject the null hypothesis of independence and conclude the variables are significantly related. For stricter criteria, use p < 0.01 or p < 0.001. Always report the exact p-value rather than just saying "significant" or "not significant."
What are the degrees of freedom in a chi-square test?
For a contingency table: df = (number of rows − 1) × (number of columns − 1). For a 2×2 table: df = 1. For a 3×4 table: df = (3-1)(4-1) = 6. For a goodness of fit test: df = number of categories − 1. Degrees of freedom determine which chi-square distribution to use for finding the p-value.
What if my expected cell frequencies are less than 5?
If any expected cell frequency is less than 5, the standard chi-square approximation may be unreliable. Options: (1) Use Fisher's Exact Test, which gives exact p-values for small samples; (2) Combine categories if doing so makes substantive sense; (3) Apply Yates' continuity correction (controversial but sometimes used). For 2×2 tables with total N < 20, Fisher's Exact Test is strongly preferred.
Can chi-square tell me the strength of an association?
Chi-square significance alone does not indicate effect size. With a large enough sample, even a trivially weak relationship becomes statistically significant. Use Phi (φ) for 2×2 tables or Cramér's V for larger tables to measure effect size. Values near 0.1 are small, near 0.3 are medium, and near 0.5 are large effects.