Correlation Calculator — Pearson r and R²

What Is Correlation?

Correlation measures the strength and direction of the linear relationship between two numerical variables. When one variable changes, does the other tend to change in a predictable way? The Pearson correlation coefficient (r) quantifies this relationship on a scale from −1 to +1.

Examples of correlated variables:

The Pearson r coefficient answers: given one variable's value, how well can we predict the other? A value near +1 means strong positive relationship (both increase together). Near −1 means strong negative relationship (one increases as the other decreases). Near 0 means little to no linear relationship.

Important caveat: correlation does not imply causation. Two variables can be strongly correlated because they're both caused by a third variable, because the relationship is coincidental, or because of confounding factors in how the data was collected. Always interpret correlations in context before drawing causal conclusions.

The Pearson Correlation Formula

The Pearson correlation coefficient is calculated as the covariance of two variables divided by the product of their standard deviations:

r = Σ[(xi − x̄)(yi − ȳ)] / √[Σ(xi − x̄)² × Σ(yi − ȳ)²]

Equivalently, using raw scores:

r = [nΣxy − ΣxΣy] / √{[nΣx² − (Σx)²][nΣy² − (Σy)²]}

Where n is the number of paired observations.

R² (coefficient of determination) = r² tells you the proportion of variance in Y that is explained by X. If r = 0.80, then R² = 0.64, meaning 64% of the variation in Y is accounted for by the linear relationship with X.

Worked example: Weekly training miles (X) and 5K time in minutes (Y) for 5 runners:

RunnerMiles/week (X)5K time (Y)
A2028
B3524
C5021
D6519
E8017

x̄ = 50, ȳ = 21.8. After computing the formula: r ≈ −0.99. As miles per week increase, 5K time decreases — an extremely strong negative correlation. R² ≈ 0.98, meaning training volume explains 98% of the variance in race times for these runners.

Interpreting the Pearson r Value

r valueStrengthDirectionExample
+0.90 to +1.00Very strongPositiveHeight vs weight in adults
+0.70 to +0.89StrongPositiveStudy hours vs test score
+0.50 to +0.69ModeratePositiveIncome vs education level
+0.30 to +0.49WeakPositiveShoe size vs IQ score
+0.10 to +0.29Very weakPositiveBirth month vs sports success
−0.10 to +0.10NegligibleNoneTV color preference vs salary
−0.30 to −0.10Very weakNegativeHours slept vs reaction time
−0.50 to −0.30WeakNegativeDistance from work vs satisfaction
−0.70 to −0.50ModerateNegativeTraining load vs injury risk
−0.90 to −0.70StrongNegativeBody fat % vs VO2 max
−1.00 to −0.90Very strongNegativeSpeed vs race time

Note: These cutoffs (0.30, 0.50, 0.70) are Cohen's guidelines and are conventions, not absolute rules. In social science with many confounding variables, r = 0.30 may be meaningful. In physics where theory predicts r = 1.00, r = 0.90 might indicate a measurement problem. Always interpret r in the context of your field and research question.

Statistical Significance of Correlation

A correlation coefficient by itself doesn't tell you whether the relationship is statistically significant — that depends on both r and your sample size. With a small sample, even a moderately high r could be due to chance. With a large sample, even a small r can be statistically significant (though possibly not practically meaningful).

The t-test for correlation significance:

t = r × √(n−2) / √(1−r²), df = n−2

Minimum r needed for significance at p < 0.05 (two-tailed):

Sample Size (n)Min |r| for p < 0.05Min |r| for p < 0.01
50.8780.959
100.6320.765
200.4440.561
300.3610.463
500.2790.361
1000.1970.256
2000.1390.182
5000.0880.115

Key takeaway: with n = 10, you need r > 0.63 to be confident the correlation isn't due to chance. With n = 500, r = 0.09 is statistically significant but practically meaningless — X explains less than 1% of Y's variance. Always report both r and its significance, and consider practical significance alongside statistical significance.

Scatter Plot Interpretation Guide

Before calculating r, visually inspecting a scatter plot reveals patterns that r alone may miss:

Anscombe's Quartet (1973) famously demonstrated four datasets with identical r = 0.816 but completely different scatter patterns: one linear, one curved, one with an outlier, one perfectly linear except for one anomalous point. This underscores why visualizing data before interpreting correlations is essential.

Pearson vs Spearman vs Kendall Correlation

Pearson r is not the only correlation coefficient. Different situations call for different measures:

MethodData TypeAssumptionBest For
Pearson rContinuousLinear relationship, normal distributionHeight/weight, temperature/sales
Spearman ρOrdinal or continuousMonotonic relationship (ranks)Rankings, survey scales, skewed data
Kendall τOrdinal or small samplesMonotonic relationship (pairs)Small n, many ties in ranking
Point-biserialOne binary, one continuousNormal distribution of continuous varGender vs test score, pass/fail vs study hours

When to prefer Spearman over Pearson: your data is ordinal (e.g., survey responses rated 1–5), your data is heavily skewed or has outliers, or you're testing for any monotonic relationship (not just linear). Spearman converts values to ranks before computing r, making it robust to outliers and non-normality. Our calculator computes Pearson r; for Spearman, rank your data first and then use the same formula.

Correlation in Sports Science and Running Research

Correlation analysis is fundamental to sports science research. Understanding which training variables correlate with performance outcomes helps coaches and athletes make evidence-based decisions.

Training load and performance correlations: Studies consistently find strong negative correlations (r = −0.7 to −0.9) between weekly mileage and race times among recreational runners — more miles, faster times. However, correlation doesn't mean more is always better: beyond ~80–100 miles/week, injury risk increases and the performance-mileage correlation weakens or reverses.

Heart rate and pace: At steady-state aerobic effort, heart rate and running pace show strong positive correlation (r ≈ 0.85–0.95) for individual runners in controlled conditions. This forms the basis of heart rate training zones and aerobic threshold testing.

VO2 max and race performance: Maximal oxygen uptake correlates strongly with endurance performance across athletes (r = −0.70 to −0.85 with race times). However, within a group of elite runners with similarly high VO2 max, the correlation weakens — other factors like lactate threshold and running economy then distinguish performance.

Sleep and recovery metrics: Heart rate variability (HRV) correlates positively with perceived recovery status (r ≈ 0.5–0.7) in athletes. This moderately strong correlation has driven widespread adoption of HRV monitoring in elite sport, though the individual variability means population-level correlations don't perfectly predict individual responses.

Pitfalls in sports correlation research: Publication bias favors high correlations, so the literature may overstate relationships. Cross-sectional studies (measuring both variables at one time point) can't establish causation. Aggregating across athletes with different training levels (beginners vs. elites) can create artificially strong correlations that don't apply within any subgroup. Always consider the specific population when generalizing correlation findings.

Frequently Asked Questions

What is a good Pearson correlation coefficient?

It depends on the field. In psychology and social science, r > 0.50 is considered strong. In physics or engineering, r > 0.99 might be expected. For predicting athletic performance with a single training variable, r = 0.70–0.85 is typical and considered meaningful. Always interpret r in context — what matters is whether the relationship is strong enough for your specific application.

What does R² (R-squared) mean?

R² is the proportion of variance in Y explained by X. If r = 0.80, then R² = 0.64, meaning 64% of the variation in Y is accounted for by the linear relationship with X. The remaining 36% is due to other factors or random variation. R² ranges from 0 (no linear relationship) to 1 (perfect linear relationship).

Can correlation be used to predict one variable from another?

Correlation tells you about the strength of a relationship, but for prediction use linear regression. Regression gives you the equation of the best-fit line (Y = a + bX), enabling actual predictions. The correlation r is the square root of R² in simple linear regression. If you have a strong correlation, regression will give you a reliable prediction equation.

Why can two variables be correlated but not causally related?

Correlation without causation can arise from: (1) Third-variable causation — both X and Y are caused by Z (ice cream sales and drowning both increase in summer due to hot weather, not because of each other); (2) Reverse causation — Y causes X, not X causes Y; (3) Coincidence — spurious correlations in small samples or time-series data. Establishing causation requires experimental design (randomized controlled trials) or rigorous causal inference methods.

What sample size do I need for reliable correlation estimation?

A minimum of n = 30 pairs is a common rule of thumb for reliable Pearson r estimates. For detecting small correlations (r ≈ 0.2), you may need n = 150–200. For high-stakes research, power analysis should determine the exact n based on expected effect size and desired significance level. With n < 10, correlation estimates are highly unstable and should be interpreted with extreme caution.

⚡ Powered by RunCalc