Z-Score Calculator – Standard Score, Percentile & Probability
Calculate z-scores and convert to percentiles using the standard normal distribution. Try this free online math calculator for instant, accurate results.
What Is a Z-Score?
A z-score (also called a standard score) tells you exactly how many standard deviations a particular value lies above or below the mean of its dataset. The formula is deceptively simple: z = (x − μ) / σ, where x is your observed value, μ (mu) is the population mean, and σ (sigma) is the population standard deviation.
The power of z-scores lies in standardisation: by converting raw values to z-scores, you can compare measurements from completely different scales. A student scoring 78 on a biology test (mean 70, SD 10) has z = +0.8. The same student scoring 85 on a history test (mean 80, SD 3.33) has z = +1.5. Despite the raw score difference, the student performed relatively better in history — a fact invisible without z-score conversion.
Z-scores are foundational in statistics, psychology, education, medicine, and quality control. They connect directly to probabilities under the normal distribution, allowing you to calculate the percentage of a population above, below, or between any two values.
The Standard Normal Distribution and Percentiles
When z-scores are plotted, they follow the standard normal distribution — a bell-shaped curve with mean = 0 and standard deviation = 1. The area under this curve represents probability: the area to the left of a z-score equals the percentile rank (the percentage of values falling below that z-score).
| Z-Score | Percentile | % Above | Interpretation |
|---|---|---|---|
| −3.0 | 0.13% | 99.87% | Extremely below average |
| −2.0 | 2.28% | 97.72% | Well below average |
| −1.5 | 6.68% | 93.32% | Below average |
| −1.0 | 15.87% | 84.13% | Slightly below average |
| −0.5 | 30.85% | 69.15% | Low average |
| 0.0 | 50.00% | 50.00% | Exactly at mean |
| +0.5 | 69.15% | 30.85% | High average |
| +1.0 | 84.13% | 15.87% | Slightly above average |
| +1.5 | 93.32% | 6.68% | Above average |
| +2.0 | 97.72% | 2.28% | Well above average |
| +3.0 | 99.87% | 0.13% | Extremely above average |
These percentiles come from the cumulative distribution function (CDF) of the normal distribution. In practice, you look them up in a z-table or calculate them using software (Excel's NORM.S.DIST, Python's scipy.stats.norm.cdf, or this calculator).
The 68-95-99.7 Rule (Empirical Rule)
One of the most widely cited facts in statistics, the empirical rule describes the percentage of data falling within 1, 2, and 3 standard deviations of the mean in a normal distribution:
- ±1σ (z between −1 and +1): 68.27% of data
- ±2σ (z between −2 and +2): 95.45% of data
- ±3σ (z between −3 and +3): 99.73% of data
Equivalently, only 5% of normally distributed data falls more than 2 standard deviations from the mean, and only 0.27% (about 1 in 370) falls beyond 3 standard deviations. This is why ±2σ is a common threshold for "significantly different from average" and ±3σ flags extreme outliers.
| Range | Data Included | Data Excluded | 1-in-N rarity |
|---|---|---|---|
| ±1σ | 68.27% | 31.73% | ~1 in 3 |
| ±2σ | 95.45% | 4.55% | ~1 in 22 |
| ±3σ | 99.73% | 0.27% | ~1 in 370 |
| ±4σ | 99.9937% | 0.0063% | ~1 in 15,787 |
| ±6σ | 99.9999998% | 0.0000002% | ~1 in 506,842,372 |
Six Sigma quality management aims to reduce manufacturing defects to fewer than 3.4 per million opportunities — a level that assumes 1.5σ process shift over time, making it roughly equivalent to ±4.5σ. The aspiration of "six sigma" performance is to make defects statistically insignificant.
Z-Scores in Standardised Testing
Standardised tests — SAT, ACT, IQ tests, GRE, GMAT — are designed to produce normally distributed scores that can be meaningfully converted to percentiles using z-scores. This enables comparison across different test forms (which may vary slightly in difficulty) and across years.
IQ Scores: Designed with mean = 100 and standard deviation = 15. An IQ of 130 has z = (130−100)/15 = +2.0, placing the person at the 97.7th percentile. An IQ of 145 has z = +3.0, placing them at the 99.87th percentile (roughly 1 in 740 people).
SAT Scores: Each section (Evidence-Based Reading/Writing and Math) has mean ~500 and SD ~100. A math score of 680 has z = (680−500)/100 = +1.8, approximately the 96th percentile. A combined score of 1400 (z ≈ +1.8–2.0) places a student in roughly the top 5% of test takers.
| Test | Mean | SD | Score of 1σ above mean | Percentile |
|---|---|---|---|---|
| IQ | 100 | 15 | 115 | 84th |
| SAT (each section) | 500 | 100 | 600 | 84th |
| ACT | 21 | 5 | 26 | 84th |
| GRE Verbal | 150 | 8.5 | 158.5 | 84th |
Z-Scores in Quality Control and Six Sigma
In manufacturing and process quality control, z-scores are used to measure process capability — how well a production process falls within specification limits. The process capability index Cp and Cpk are derived from z-score concepts.
Process capability: If a process mean is μ and standard deviation is σ, and specifications require the output to fall between Lower Spec Limit (LSL) and Upper Spec Limit (USL), then:
- zupper = (USL − μ) / σ
- zlower = (μ − LSL) / σ
- Cp = (USL − LSL) / (6σ) — measures spread relative to specification width
- Cpk = min(zupper, zlower) / 3 — accounts for process centring
A Cpk ≥ 1.33 is typically required in automotive and aerospace manufacturing (equivalent to ±4σ process capability). Medical device manufacturing often requires Cpk ≥ 1.67 (±5σ). The target of "Six Sigma" processes is Cpk = 2.0.
Z-Scores in Medical Reference Ranges
Medical laboratories report test results relative to reference ranges, which are typically defined as the central 95% of a healthy population — corresponding to z-scores between −1.96 and +1.96. A result outside this range is flagged as "abnormal," though this simply means it is statistically unusual, not necessarily clinically concerning.
Bone density (DEXA scan): Results are reported as T-scores (comparison to young-adult norm) and Z-scores (comparison to age-matched norm):
- T-score ≥ −1.0: Normal
- T-score −1.0 to −2.5: Osteopenia
- T-score < −2.5: Osteoporosis
Growth charts: Children's height, weight, and head circumference are plotted as z-scores relative to age-sex norms. A child at the 50th percentile has z = 0; at the 97th percentile z = +1.88; at the 3rd percentile z = −1.88. Paediatric z-score cutoffs guide nutritional and developmental assessments.
Haematology: Blood counts (haemoglobin, white cells, platelets) have reference ranges expressed as mean ± 2SD. Values beyond these ranges trigger clinical review, though individual variation and laboratory differences mean clinical context is essential.
Hypothesis Testing and Z-Tests
Z-scores form the basis of the z-test, one of the most commonly used hypothesis tests in statistics. When testing whether a sample mean differs significantly from a known population mean, you calculate:
z = (x̄ − μ₀) / (σ / √n)
where x̄ is the sample mean, μ₀ is the hypothesised population mean, σ is the known population standard deviation, and n is the sample size.
If |z| > 1.96, the result is statistically significant at the α = 0.05 level (two-tailed). If |z| > 2.576, it is significant at α = 0.01. These critical values come directly from the normal distribution: 95% of the distribution falls within ±1.96 SD, and 99% within ±2.576 SD.
| Significance Level (α) | Critical z-Value (two-tailed) | Interpretation |
|---|---|---|
| 0.10 (10%) | ±1.645 | 90% confidence |
| 0.05 (5%) | ±1.960 | 95% confidence (standard) |
| 0.01 (1%) | ±2.576 | 99% confidence |
| 0.001 (0.1%) | ±3.291 | 99.9% confidence |
Z-Score Limitations and When Not to Use Them
Z-scores and percentile calculations derived from them assume the underlying data follows a normal (Gaussian) distribution. Many real-world datasets violate this assumption:
- Income and wealth: Highly right-skewed — the mean is much higher than the median, and z-scores dramatically underestimate how rare extreme wealth is.
- Financial returns: Have "fat tails" — extreme events (market crashes, windfalls) occur far more frequently than a normal distribution predicts. Models using z-scores underestimated the probability of the 2008 financial crisis.
- Social media metrics: Followers, likes, and views follow power-law distributions, not normal distributions. Z-scores are meaningless here.
- Small samples: With fewer than ~30 observations, the t-distribution (with heavier tails) is more appropriate than the z-distribution.
Before applying z-score analysis, always check that your data is approximately normally distributed using histograms, Q-Q plots, or formal normality tests (Shapiro-Wilk, Anderson-Darling). If the data is non-normal, consider transformations (log, square root) or non-parametric alternatives.
Frequently Asked Questions
What does a z-score of 1.5 mean?
A z-score of 1.5 means the value is 1.5 standard deviations above the mean, placing it at approximately the 93rd percentile. About 93.3% of values in a normal distribution fall below this point, and 6.7% fall above it.
What is a good z-score?
"Good" depends on context. For test scores or performance metrics, higher z-scores are better. For risk indicators (cholesterol, blood pressure), z-scores near 0 are healthiest. In quality control, z-scores beyond ±3 flag defects or outliers. There is no universally "good" z-score — it depends on what is being measured.
How do I calculate a z-score?
Subtract the mean from your value, then divide by the standard deviation: z = (x − μ) / σ. Example: score of 85, mean 70, SD 10 → z = (85−70)/10 = 1.5. This means the score is 1.5 standard deviations above the class average.
What is the z-score for the 95th percentile?
The z-score corresponding to the 95th percentile is approximately +1.645 (one-tailed). This is also the critical value for a one-tailed significance test at α = 0.05. For the two-tailed 95% range (i.e., central 95% of the distribution), the cutoffs are ±1.96.
Can a z-score be negative?
Yes. A negative z-score means the value is below the mean. A z-score of −1.0 means the value is one standard deviation below the mean, at the 15.87th percentile. Z-scores range from −∞ to +∞, though values beyond ±4 are extremely rare in normally distributed data.
What is the difference between a z-score and a t-score?
Both standardise data relative to mean and standard deviation. A z-score assumes the population standard deviation (σ) is known. A t-score (or t-statistic) uses the sample standard deviation (s) as an estimate when σ is unknown, and follows the heavier-tailed t-distribution. For large samples (n > 30), t and z are nearly identical.
How is z-score used in finance?
The Altman Z-score predicts corporate bankruptcy risk using a weighted combination of financial ratios. In risk management, z-scores measure how many standard deviations a portfolio return is from zero (Value at Risk). In algorithmic trading, z-scores of price spreads identify mean-reversion opportunities (pairs trading).
What percentage of data falls within 2 standard deviations?
Approximately 95.45% of data falls within ±2σ of the mean in a normal distribution (the empirical rule). The exact figure is 95.449%. The complementary 4.551% lies beyond ±2σ — 2.275% in each tail. This is why ±2σ is the standard threshold for "statistically significant" in many fields (α = 0.05, two-tailed).
How do I convert a z-score to a percentile?
Look up the z-score in a standard normal table (z-table), which gives the cumulative probability. Multiply by 100 for the percentile. For example, z = 1.0 → 0.8413 → 84th percentile. Alternatively, use the formula: percentile = Φ(z) × 100, where Φ is the standard normal CDF. Excel: =NORM.S.DIST(z,TRUE)×100.
What is the z-score used for in quality control?
In Six Sigma quality management, z-scores measure process capability. A process running at ±3σ (z = 3) produces 2,700 defects per million. At ±6σ (z = 6) it produces just 3.4 defects per million (accounting for typical process drift). Cp and Cpk indices directly use z-score concepts to quantify how well a process meets specifications.
Outlier Detection Using Z-Scores
One of the most common practical applications of z-scores is outlier detection — identifying data points that are unusually far from the mean and may represent errors, extraordinary events, or genuinely unusual observations requiring investigation.
The standard threshold for flagging outliers is |z| > 3. Values more than 3 standard deviations from the mean are expected in only 0.27% of observations under a normal distribution — roughly 1 in 370 data points. In a dataset of 1,000 measurements, you would expect only ~3 values beyond ±3σ by chance. If you find 20 such values, something unusual is happening — equipment malfunction, data entry errors, or genuine extreme observations.
More stringent criteria are used in specific fields:
- Medical devices: Alarm thresholds at ±2σ (5% alarm rate) to ±3σ (0.27% alarm rate) depending on clinical urgency
- Financial markets: "Fat tail" events beyond ±4σ occur far more frequently than a normal distribution predicts — the 2008 financial crisis involved moves of 5–7σ that were theoretically "impossible" under normal distribution assumptions
- Quality control: Values beyond ±3σ (defects in the Six Sigma framework) require process investigation and root-cause analysis
- Scientific research: The 5σ threshold (|z| > 5) is required for claiming a particle physics discovery (as in the 2012 Higgs boson announcement at CERN)
| Z-Score Threshold | % Flagged (normal) | Used In |
|---|---|---|
| |z| > 2.0 | 4.55% | Initial data screening |
| |z| > 2.5 | 1.24% | Medical reference ranges |
| |z| > 3.0 | 0.27% | Quality control, outlier detection |
| |z| > 4.0 | 0.0063% | Process defect analysis |
| |z| > 5.0 | 0.00006% | Particle physics discovery claim |
Important caveat: real-world data often has heavier tails than the normal distribution predicts (leptokurtic distributions). Always inspect outliers manually — a z-score of 4 might be a data entry error (48 recorded as 4.8) or a genuine extreme value with important meaning. Never automatically delete outliers without investigation.
Z-Scores in Finance and Risk Management
In finance, z-scores have multiple critical applications beyond academic statistics. The most famous is the Altman Z-Score (1968), a bankruptcy prediction model that combines five financial ratios into a single discriminant score:
Z = 1.2×(Working Capital/Total Assets) + 1.4×(Retained Earnings/Total Assets) + 3.3×(EBIT/Total Assets) + 0.6×(Market Cap/Total Liabilities) + 1.0×(Revenue/Total Assets)
Altman Z-Score interpretation: Z > 2.99 = Safe zone; 1.81–2.99 = Grey zone; Z < 1.81 = Distress zone (high bankruptcy risk). The model correctly predicted bankruptcy in 94% of cases in original studies and remains widely used by credit analysts and investors today.
Value at Risk (VaR): In portfolio risk management, VaR uses z-scores to quantify potential losses. The 1-day 95% VaR for a portfolio with daily return mean μ and standard deviation σ is: VaR = −(μ + z × σ) where z = −1.645 (the 5th percentile). If a $1M portfolio has daily μ = 0% and σ = 1%, VaR at 95% confidence = 1.645% × $1M = $16,450. This means there is a 5% chance of losing more than $16,450 in a single day.
| Confidence Level | Z-Score Used | Interpretation |
|---|---|---|
| 90% | −1.282 | Loss exceeded 10% of trading days |
| 95% | −1.645 | Loss exceeded 5% of trading days |
| 99% | −2.326 | Loss exceeded 1% of trading days |
| 99.9% | −3.090 | Loss exceeded 0.1% of trading days |
<h2>Calculating Z-Scores with Sample Data</h2>
<p>When working with a sample (rather than a known population), you estimate the population parameters from the sample. The sample mean (x̄) estimates μ, and the sample standard deviation (s) estimates σ. The z-score formula remains the same: z = (x − x̄) / s.</p>
<p>However, with small samples, the resulting z-scores follow the t-distribution (not the normal distribution) due to the added uncertainty in estimating σ. The t-distribution has heavier tails, reflecting this greater uncertainty. For samples of 30 or more, the t-distribution and normal distribution are nearly identical, and z-scores from either calculation are approximately equivalent.</p>
<p>When you have a dataset and want to standardise all values (convert the entire dataset to z-scores), this is called <strong>feature scaling</strong> or <strong>standardisation</strong> in machine learning. It is a preprocessing step that puts all features on the same scale (mean = 0, SD = 1), preventing features with larger absolute values from dominating distance-based algorithms (KNN, SVM, neural networks). After standardisation, each feature's z-scores are directly comparable regardless of original units or scale.</p>
<p>To standardise a dataset in Python: <code>from sklearn.preprocessing import StandardScaler; scaler = StandardScaler(); X_scaled = scaler.fit_transform(X)</code>. In Excel: for each value in a column, compute <code>=STANDARDIZE(value, AVERAGE(range), STDEV(range))</code>.</p>