Calculadora de Variancia - Variancia Populacional e Amostral
Calcule variancia e desvio padrao para um conjunto de dados. Suporta variancia de populacao e amostra. Calculadora de estatisticas gratuita.
What Is Variance?
Variance measures the spread of a dataset — how far the values are from the mean. A low variance means data points cluster near the mean; a high variance means they are spread out widely.
Variance is calculated as the average of squared differences from the mean:
- Population variance (σ²): σ² = Σ(xᵢ − μ)² / N
- Sample variance (s²): s² = Σ(xᵢ − x̄)² / (N−1)
Where xᵢ is each data point, μ (or x̄) is the mean, and N is the number of values. The standard deviation is simply the square root of variance — it is in the same units as the original data, making it more interpretable.
Why do we square the differences? Two reasons: (1) squaring eliminates negative values so that deviations above and below the mean don't cancel out, and (2) squaring gives disproportionate weight to outliers, making variance sensitive to extreme values. This property is both a strength (outlier detection) and a weakness (outlier sensitivity). For data with extreme outliers, consider using the median absolute deviation (MAD) as a more robust alternative.
Population vs. Sample Variance
The key difference is the denominator — N vs. (N−1) — known as Bessel's correction:
| Type | Denominator | Use When | Symbol |
|---|---|---|---|
| Population Variance | N | You have data on the entire population | σ² |
| Sample Variance | N−1 | You have a sample from a larger population | s² |
In practice, most real-world data is a sample. Using N−1 (sample variance) produces an unbiased estimate of the true population variance. Using N (population variance) on a sample systematically underestimates the true variance.
Example: Testing a new drug on 50 patients means using sample variance (s²). Analyzing all students in a classroom means using population variance (σ²).
Why does Bessel's correction work? When you calculate the sample mean, you use one "degree of freedom" — the mean is computed from the data itself, so the deviations from the mean are not fully independent. Dividing by (N−1) instead of N compensates for this loss of one degree of freedom, producing an unbiased estimator of the population variance. As N grows large, the difference between N and N−1 becomes negligible.
Step-by-Step Variance Calculation
Given the data set: 4, 7, 13, 2, 8
- Calculate the mean: (4+7+13+2+8) ÷ 5 = 34/5 = 6.8
- Find deviations from mean: (4−6.8)=−2.8; (7−6.8)=0.2; (13−6.8)=6.2; (2−6.8)=−4.8; (8−6.8)=1.2
- Square the deviations: 7.84; 0.04; 38.44; 23.04; 1.44
- Sum of squares: 7.84+0.04+38.44+23.04+1.44 = 70.8
- Population variance: 70.8 ÷ 5 = 14.16
- Sample variance: 70.8 ÷ 4 = 17.7
- Standard deviation: √14.16 = 3.76 (population) or √17.7 = 4.21 (sample)
Shortcut Formula for Variance
There is an equivalent "computational" formula that avoids calculating deviations explicitly, useful when computing by hand or in spreadsheets:
σ² = (Σxᵢ²)/N − (Σxᵢ/N)² = (Σxᵢ² − (Σxᵢ)²/N) / N
For sample variance: s² = (Σxᵢ² − (Σxᵢ)²/N) / (N−1)
Using our example data (4, 7, 13, 2, 8):
- Σxᵢ = 34, so (Σxᵢ)² = 1,156
- Σxᵢ² = 16 + 49 + 169 + 4 + 64 = 302
- Population variance = (302 − 1156/5) / 5 = (302 − 231.2) / 5 = 70.8 / 5 = 14.16 ✓
- Sample variance = 70.8 / 4 = 17.7 ✓
This formula is numerically identical but can suffer from floating-point precision issues when values are very large. For computational stability, Welford's online algorithm (which processes one value at a time) is preferred in software implementations.
Related Statistical Measures
Variance is one of several measures of spread. Each has different strengths:
| Measure | Formula | Units | Robustness to Outliers | Best For |
|---|---|---|---|---|
| Variance (σ² or s²) | Avg. of squared deviations | Squared units | Low — very sensitive | Theoretical statistics, ANOVA |
| Standard Deviation (σ or s) | √Variance | Same as data | Low | Reporting spread in original units |
| Range | Max − Min | Same as data | Very low | Quick check, small samples |
| Interquartile Range (IQR) | Q3 − Q1 | Same as data | High | Skewed distributions, box plots |
| Mean Absolute Deviation (MAD) | Avg. of |xᵢ − mean| | Same as data | Moderate | Intuitive measure of spread |
| Coefficient of Variation (CV) | (SD / Mean) × 100% | Percentage | Low | Comparing spread across different scales |
For normal (bell-curve) distributions, the standard deviation has a special interpretation: approximately 68% of data falls within ±1 SD of the mean, 95% within ±2 SD, and 99.7% within ±3 SD. This is the empirical rule (68-95-99.7 rule).
Variance in Spreadsheets and Programming
Most tools have built-in variance functions. Make sure you choose the correct version (population vs. sample):
| Tool | Sample Variance | Population Variance |
|---|---|---|
| Excel / Google Sheets | VAR.S(range) or VAR(range) | VAR.P(range) or VARP(range) |
| Python (NumPy) | np.var(data, ddof=1) | np.var(data) |
| Python (statistics) | statistics.variance(data) | statistics.pvariance(data) |
| R | var(x) | var(x) * (n-1)/n |
| JavaScript | Manual calculation (no built-in) | Manual calculation |
| SQL (PostgreSQL) | VAR_SAMP(column) | VAR_POP(column) |
| MATLAB | var(x) | var(x, 1) |
Note: Python's NumPy defaults to population variance (ddof=0), while R's var() defaults to sample variance. This is a common source of confusion when comparing results across languages.
Practical Applications of Variance
| Field | Application | Example |
|---|---|---|
| Finance | Investment risk | High variance = more volatile stock returns |
| Manufacturing | Quality control | Low variance = consistent product dimensions |
| Medicine | Clinical trials | Measuring variability in patient responses |
| Sports science | Performance analysis | Variability in athlete performance over season |
| Education | Test score analysis | Understanding spread of student performance |
Variance in Finance: Portfolio Risk
In finance, variance and standard deviation measure investment risk. Higher variance means returns fluctuate more — the investment is riskier. Harry Markowitz's Modern Portfolio Theory (1952, Nobel Prize 1990) uses variance as the central risk measure.
For a portfolio of two assets, the combined variance depends on individual variances and the correlation between assets:
σ²portfolio = w₁²σ₁² + w₂²σ₂² + 2·w₁·w₂·σ₁·σ₂·ρ₁₂
Where w = weight, σ² = variance, and ρ = correlation. When ρ < 1 (assets don't move in perfect lockstep), the portfolio variance is less than the weighted average of individual variances. This is the mathematical basis of diversification — combining uncorrelated assets reduces overall risk without proportionally reducing expected return.
| Asset Class (2000–2023) | Annualized Return | Annualized SD (Volatility) |
|---|---|---|
| US Large Cap (S&P 500) | ~7.5% | ~15% |
| US Small Cap (Russell 2000) | ~7.0% | ~20% |
| International Developed (EAFE) | ~4.5% | ~17% |
| US Bonds (Aggregate) | ~4.0% | ~4% |
| Gold | ~8.0% | ~16% |
A portfolio combining stocks and bonds typically has a standard deviation significantly lower than stocks alone, while still capturing most of the equity return premium.
Variance in Quality Control (Six Sigma)
Manufacturing uses variance to control product quality. The Six Sigma methodology, developed by Motorola in the 1980s, aims to reduce process variance until virtually no products fall outside specification limits.
| Sigma Level | Defects per Million (DPMO) | Yield | Process Capability (Cpk) |
|---|---|---|---|
| 1σ | 691,462 | 30.9% | 0.33 |
| 2σ | 308,538 | 69.1% | 0.67 |
| 3σ | 66,807 | 93.3% | 1.00 |
| 4σ | 6,210 | 99.38% | 1.33 |
| 5σ | 233 | 99.977% | 1.67 |
| 6σ | 3.4 | 99.99966% | 2.00 |
A process operating at 6σ produces only 3.4 defects per million opportunities. The process capability index Cpk directly relates to variance: Cpk = (USL − μ) / (3σ), where USL is the upper specification limit. Reducing variance (through better machines, training, or materials) increases Cpk and pushes the process toward Six Sigma quality.
Worked Examples from Different Fields
These real-world examples show how variance is calculated and interpreted in practice:
Example 1: Stock Return Volatility
Monthly returns for a stock over 6 months: +3.2%, −1.5%, +4.8%, −0.7%, +2.1%, +1.6%
- Mean = (3.2−1.5+4.8−0.7+2.1+1.6) / 6 = 9.5/6 = 1.583%
- Deviations: 1.617, −3.083, 3.217, −2.283, 0.517, 0.017
- Squared: 2.615, 9.504, 10.349, 5.212, 0.267, 0.0003
- Sum of squares = 27.947
- Sample variance = 27.947/5 = 5.589 (%²)
- Standard deviation = √5.589 = 2.364% per month
- Annualized volatility ≈ 2.364% × √12 = 8.19%
This stock has moderate volatility. The S&P 500 historically has ~15% annualized volatility, so this stock is roughly half as volatile as the broad market.
Example 2: Manufacturing Quality Control
A factory produces bolts with target length 50.00 mm. A sample of 8 bolts measures: 50.02, 49.98, 50.05, 49.97, 50.01, 50.03, 49.99, 50.00 mm.
- Mean = 400.05/8 = 50.00625 mm
- Sample variance = 0.000655 mm²
- Standard deviation = 0.0256 mm
- With spec limits of 50.00 ± 0.10 mm: Cpk = (50.10 − 50.006) / (3 × 0.0256) = 1.22
A Cpk of 1.22 means the process is capable but has little margin. The industry standard target is Cpk ≥ 1.33 (4σ capability), so this process needs tighter control to achieve that level.
Example 3: Student Test Scores
A class of 10 students scores: 72, 85, 90, 68, 77, 95, 83, 79, 88, 73 on an exam.
- Mean = 810/10 = 81.0
- Population variance (entire class) = 72.2
- Standard deviation = 8.50
- Coefficient of variation = 8.50/81.0 × 100% = 10.5%
A CV of 10.5% indicates moderate spread — most students performed within a reasonable range of the mean. If CV exceeded 25%, the instructor might investigate whether the test had questions that were too difficult for some students or whether there was a bimodal distribution (two distinct groups).
Common Mistakes When Calculating Variance
Avoid these frequent errors:
| Mistake | Why It's Wrong | Correction |
|---|---|---|
| Using N instead of N−1 for samples | Underestimates true population variance | Use N−1 for any data that's a sample from a larger population |
| Averaging absolute deviations instead of squaring | Gives MAD, not variance | Square each deviation, then average. Take √ for standard deviation |
| Forgetting to square before averaging | Positive and negative deviations cancel out, giving ~0 | Always square deviations first |
| Comparing variance across different scales | Variance depends on units; $² ≠ kg² | Use coefficient of variation (CV) for cross-scale comparison |
| Assuming variance = standard deviation | Variance is SD²; units are squared | Take the square root of variance to get SD |
ANOVA: Comparing Variance Across Groups
Analysis of Variance (ANOVA) is a statistical test that compares means of multiple groups by analyzing variance. Despite the name, it tests whether group means differ, not whether variances differ.
ANOVA partitions total variance into two components:
- Between-group variance: How much group means differ from the overall mean
- Within-group variance: How much individual values vary within each group
The F-statistic = Between-group variance / Within-group variance. A large F means the groups are more different from each other than expected by chance. If F exceeds the critical value (or p < 0.05), at least one group mean is significantly different.
Example: Comparing test scores of students taught by three different methods. ANOVA tells you whether the teaching method matters; post-hoc tests (Tukey, Bonferroni) tell you which methods differ.
💡 Did you know?
- Variance was introduced by Ronald Fisher in 1918 — the same paper where he coined the term "variance."
- In finance, variance is the basis of Modern Portfolio Theory. A portfolio's variance depends not just on individual asset variance but on correlations between assets.
- The coefficient of variation (CV = standard deviation / mean × 100%) allows comparing variability across datasets with different units or scales.
- Chebyshev's inequality guarantees that for any distribution (not just normal), at least 75% of data falls within ±2 standard deviations and at least 89% within ±3 standard deviations. This is weaker than the empirical rule but applies universally.
Frequently Asked Questions
What is the difference between variance and standard deviation?
Variance is the average of squared deviations from the mean; standard deviation is its square root. Standard deviation is in the same units as the original data (e.g., dollars, kg, seconds), making it more interpretable. Variance is useful in mathematical operations (variances of independent variables add directly), while standard deviation is better for describing spread to a non-technical audience.
When should I use sample vs. population variance?
Use population variance when your data contains every member of the group you're analyzing (e.g., all employees in one company). Use sample variance when your data is a subset of a larger group (e.g., a survey of 500 voters to estimate all voters' opinions). In most real-world research and statistics, sample variance is appropriate.
Can variance be negative?
No. Variance is always zero or positive because it is calculated from squared values. Variance = 0 only when all data points are identical (no spread). A negative variance is mathematically impossible and indicates a calculation error.
What is a "high" or "low" variance?
High and low are relative to the scale and context of the data. A variance of 10 is "low" for human heights in cm but "high" for heights in meters. The coefficient of variation (SD / mean × 100%) is scale-independent and allows comparison across different datasets. In quality control, specifications define acceptable variance ranges for each measurement.
How does variance relate to the normal distribution?
The normal (Gaussian) distribution is fully described by just two parameters: the mean (μ) and the variance (σ²). The familiar bell curve is wider when variance is large and narrower when variance is small. For normal data, the empirical rule holds: 68.3% within ±1σ, 95.4% within ±2σ, and 99.7% within ±3σ. Many statistical tests (t-test, ANOVA, regression) assume data follows a normal distribution or that sample means are approximately normal (via the Central Limit Theorem).
What is pooled variance?
Pooled variance is a weighted average of sample variances from two or more groups, used in the two-sample t-test when you assume equal variances across groups. The formula is: s²pooled = [(n₁−1)s₁² + (n₂−1)s₂²] / (n₁ + n₂ − 2). This produces a single variance estimate that incorporates information from both samples, increasing statistical power when the equal-variance assumption is valid.
},{"@type":“Question”,“name”:“Can variance be negative?”,“acceptedAnswer”:{"@type":“Answer”,“text”:“No. Variance is always zero or positive because it is calculated from squared values. Variance = 0 only when all data points are identical.”}},{"@type":“Question”,“name”:“How does variance relate to the normal distribution?”,“acceptedAnswer”:{"@type":“Answer”,“text”:“The normal distribution is fully described by mean and variance. For normal data, 68% falls within ±1 standard deviation, 95% within ±2, and 99.7% within ±3.”}},{"@type":“Question”,“name”:“What is pooled variance?”,“acceptedAnswer”:{"@type":“Answer”,“text”:“Pooled variance is a weighted average of sample variances from two or more groups, used in two-sample t-tests when assuming equal variances.”}}]}