A z-score (also called a standard score) tells you exactly how many standard deviations a particular value lies above or below the mean of its dataset. The formula is deceptively simple: z = (x − μ) / σ, where x is your observed value, μ (mu) is the population mean, and σ (sigma) is the population standard deviation.
The power of z-scores lies in standardisation: by converting raw values to z-scores, you can compare measurements from completely different scales. A student scoring 78 on a biology test (mean 70, SD 10) has z = +0.8. The same student scoring 85 on a history test (mean 80, SD 3.33) has z = +1.5. Despite the raw score difference, the student performed relatively better in history — a fact invisible without z-score conversion.
Z-scores are foundational in statistics, psychology, education, medicine, and quality control. They connect directly to probabilities under the normal distribution, allowing you to calculate the percentage of a population above, below, or between any two values.
When z-scores are plotted, they follow the standard normal distribution — a bell-shaped curve with mean = 0 and standard deviation = 1. The area under this curve represents probability: the area to the left of a z-score equals the percentile rank (the percentage of values falling below that z-score).
| Z-Score | Percentile | % Above | Interpretation |
|---|---|---|---|
| −3.0 | 0.13% | 99.87% | Extremely below average |
| −2.0 | 2.28% | 97.72% | Well below average |
| −1.5 | 6.68% | 93.32% | Below average |
| −1.0 | 15.87% | 84.13% | Slightly below average |
| −0.5 | 30.85% | 69.15% | Low average |
| 0.0 | 50.00% | 50.00% | Exactly at mean |
| +0.5 | 69.15% | 30.85% | High average |
| +1.0 | 84.13% | 15.87% | Slightly above average |
| +1.5 | 93.32% | 6.68% | Above average |
| +2.0 | 97.72% | 2.28% | Well above average |
| +3.0 | 99.87% | 0.13% | Extremely above average |
These percentiles come from the cumulative distribution function (CDF) of the normal distribution. In practice, you look them up in a z-table or calculate them using software (Excel's NORM.S.DIST, Python's scipy.stats.norm.cdf, or this calculator).
One of the most widely cited facts in statistics, the empirical rule describes the percentage of data falling within 1, 2, and 3 standard deviations of the mean in a normal distribution:
Equivalently, only 5% of normally distributed data falls more than 2 standard deviations from the mean, and only 0.27% (about 1 in 370) falls beyond 3 standard deviations. This is why ±2σ is a common threshold for "significantly different from average" and ±3σ flags extreme outliers.
| Range | Data Included | Data Excluded | 1-in-N rarity |
|---|---|---|---|
| ±1σ | 68.27% | 31.73% | ~1 in 3 |
| ±2σ | 95.45% | 4.55% | ~1 in 22 |
| ±3σ | 99.73% | 0.27% | ~1 in 370 |
| ±4σ | 99.9937% | 0.0063% | ~1 in 15,787 |
| ±6σ | 99.9999998% | 0.0000002% | ~1 in 506,842,372 |
Six Sigma quality management aims to reduce manufacturing defects to fewer than 3.4 per million opportunities — a level that assumes 1.5σ process shift over time, making it roughly equivalent to ±4.5σ. The aspiration of "six sigma" performance is to make defects statistically insignificant.
Standardised tests — SAT, ACT, IQ tests, GRE, GMAT — are designed to produce normally distributed scores that can be meaningfully converted to percentiles using z-scores. This enables comparison across different test forms (which may vary slightly in difficulty) and across years.
IQ Scores: Designed with mean = 100 and standard deviation = 15. An IQ of 130 has z = (130−100)/15 = +2.0, placing the person at the 97.7th percentile. An IQ of 145 has z = +3.0, placing them at the 99.87th percentile (roughly 1 in 740 people).
SAT Scores: Each section (Evidence-Based Reading/Writing and Math) has mean ~500 and SD ~100. A math score of 680 has z = (680−500)/100 = +1.8, approximately the 96th percentile. A combined score of 1400 (z ≈ +1.8–2.0) places a student in roughly the top 5% of test takers.
| Test | Mean | SD | Score of 1σ above mean | Percentile |
|---|---|---|---|---|
| IQ | 100 | 15 | 115 | 84th |
| SAT (each section) | 500 | 100 | 600 | 84th |
| ACT | 21 | 5 | 26 | 84th |
| GRE Verbal | 150 | 8.5 | 158.5 | 84th |
In manufacturing and process quality control, z-scores are used to measure process capability — how well a production process falls within specification limits. The process capability index Cp and Cpk are derived from z-score concepts.
Process capability: If a process mean is μ and standard deviation is σ, and specifications require the output to fall between Lower Spec Limit (LSL) and Upper Spec Limit (USL), then:
A Cpk ≥ 1.33 is typically required in automotive and aerospace manufacturing (equivalent to ±4σ process capability). Medical device manufacturing often requires Cpk ≥ 1.67 (±5σ). The target of "Six Sigma" processes is Cpk = 2.0.
Medical laboratories report test results relative to reference ranges, which are typically defined as the central 95% of a healthy population — corresponding to z-scores between −1.96 and +1.96. A result outside this range is flagged as "abnormal," though this simply means it is statistically unusual, not necessarily clinically concerning.
Bone density (DEXA scan): Results are reported as T-scores (comparison to young-adult norm) and Z-scores (comparison to age-matched norm):
Growth charts: Children's height, weight, and head circumference are plotted as z-scores relative to age-sex norms. A child at the 50th percentile has z = 0; at the 97th percentile z = +1.88; at the 3rd percentile z = −1.88. Paediatric z-score cutoffs guide nutritional and developmental assessments.
Haematology: Blood counts (haemoglobin, white cells, platelets) have reference ranges expressed as mean ± 2SD. Values beyond these ranges trigger clinical review, though individual variation and laboratory differences mean clinical context is essential.
Z-scores form the basis of the z-test, one of the most commonly used hypothesis tests in statistics. When testing whether a sample mean differs significantly from a known population mean, you calculate:
z = (x̄ − μ₀) / (σ / √n)
where x̄ is the sample mean, μ₀ is the hypothesised population mean, σ is the known population standard deviation, and n is the sample size.
If |z| > 1.96, the result is statistically significant at the α = 0.05 level (two-tailed). If |z| > 2.576, it is significant at α = 0.01. These critical values come directly from the normal distribution: 95% of the distribution falls within ±1.96 SD, and 99% within ±2.576 SD.
| Significance Level (α) | Critical z-Value (two-tailed) | Interpretation |
|---|---|---|
| 0.10 (10%) | ±1.645 | 90% confidence |
| 0.05 (5%) | ±1.960 | 95% confidence (standard) |
| 0.01 (1%) | ±2.576 | 99% confidence |
| 0.001 (0.1%) | ±3.291 | 99.9% confidence |
Z-scores and percentile calculations derived from them assume the underlying data follows a normal (Gaussian) distribution. Many real-world datasets violate this assumption:
Before applying z-score analysis, always check that your data is approximately normally distributed using histograms, Q-Q plots, or formal normality tests (Shapiro-Wilk, Anderson-Darling). If the data is non-normal, consider transformations (log, square root) or non-parametric alternatives.
A z-score of 1.5 means the value is 1.5 standard deviations above the mean, placing it at approximately the 93rd percentile. About 93.3% of values in a normal distribution fall below this point, and 6.7% fall above it.
"Good" depends on context. For test scores or performance metrics, higher z-scores are better. For risk indicators (cholesterol, blood pressure), z-scores near 0 are healthiest. In quality control, z-scores beyond ±3 flag defects or outliers. There is no universally "good" z-score — it depends on what is being measured.
Subtract the mean from your value, then divide by the standard deviation: z = (x − μ) / σ. Example: score of 85, mean 70, SD 10 → z = (85−70)/10 = 1.5. This means the score is 1.5 standard deviations above the class average.
The z-score corresponding to the 95th percentile is approximately +1.645 (one-tailed). This is also the critical value for a one-tailed significance test at α = 0.05. For the two-tailed 95% range (i.e., central 95% of the distribution), the cutoffs are ±1.96.
Yes. A negative z-score means the value is below the mean. A z-score of −1.0 means the value is one standard deviation below the mean, at the 15.87th percentile. Z-scores range from −∞ to +∞, though values beyond ±4 are extremely rare in normally distributed data.
Both standardise data relative to mean and standard deviation. A z-score assumes the population standard deviation (σ) is known. A t-score (or t-statistic) uses the sample standard deviation (s) as an estimate when σ is unknown, and follows the heavier-tailed t-distribution. For large samples (n > 30), t and z are nearly identical.
The Altman Z-score predicts corporate bankruptcy risk using a weighted combination of financial ratios. In risk management, z-scores measure how many standard deviations a portfolio return is from zero (Value at Risk). In algorithmic trading, z-scores of price spreads identify mean-reversion opportunities (pairs trading).
Approximately 95.45% of data falls within ±2σ of the mean in a normal distribution (the empirical rule). The exact figure is 95.449%. The complementary 4.551% lies beyond ±2σ — 2.275% in each tail. This is why ±2σ is the standard threshold for "statistically significant" in many fields (α = 0.05, two-tailed).
Look up the z-score in a standard normal table (z-table), which gives the cumulative probability. Multiply by 100 for the percentile. For example, z = 1.0 → 0.8413 → 84th percentile. Alternatively, use the formula: percentile = Φ(z) × 100, where Φ is the standard normal CDF. Excel: =NORM.S.DIST(z,TRUE)×100.
In Six Sigma quality management, z-scores measure process capability. A process running at ±3σ (z = 3) produces 2,700 defects per million. At ±6σ (z = 6) it produces just 3.4 defects per million (accounting for typical process drift). Cp and Cpk indices directly use z-score concepts to quantify how well a process meets specifications.
One of the most common practical applications of z-scores is outlier detection — identifying data points that are unusually far from the mean and may represent errors, extraordinary events, or genuinely unusual observations requiring investigation.
The standard threshold for flagging outliers is |z| > 3. Values more than 3 standard deviations from the mean are expected in only 0.27% of observations under a normal distribution — roughly 1 in 370 data points. In a dataset of 1,000 measurements, you would expect only ~3 values beyond ±3σ by chance. If you find 20 such values, something unusual is happening — equipment malfunction, data entry errors, or genuine extreme observations.
More stringent criteria are used in specific fields:
| Z-Score Threshold | % Flagged (normal) | Used In |
|---|---|---|
| |z| > 2.0 | 4.55% | Initial data screening |
| |z| > 2.5 | 1.24% | Medical reference ranges |
| |z| > 3.0 | 0.27% | Quality control, outlier detection |
| |z| > 4.0 | 0.0063% | Process defect analysis |
| |z| > 5.0 | 0.00006% | Particle physics discovery claim |
Important caveat: real-world data often has heavier tails than the normal distribution predicts (leptokurtic distributions). Always inspect outliers manually — a z-score of 4 might be a data entry error (48 recorded as 4.8) or a genuine extreme value with important meaning. Never automatically delete outliers without investigation.
In finance, z-scores have multiple critical applications beyond academic statistics. The most famous is the Altman Z-Score (1968), a bankruptcy prediction model that combines five financial ratios into a single discriminant score:
Z = 1.2×(Working Capital/Total Assets) + 1.4×(Retained Earnings/Total Assets) + 3.3×(EBIT/Total Assets) + 0.6×(Market Cap/Total Liabilities) + 1.0×(Revenue/Total Assets)
Altman Z-Score interpretation: Z > 2.99 = Safe zone; 1.81–2.99 = Grey zone; Z < 1.81 = Distress zone (high bankruptcy risk). The model correctly predicted bankruptcy in 94% of cases in original studies and remains widely used by credit analysts and investors today.
Value at Risk (VaR): In portfolio risk management, VaR uses z-scores to quantify potential losses. The 1-day 95% VaR for a portfolio with daily return mean μ and standard deviation σ is: VaR = −(μ + z × σ) where z = −1.645 (the 5th percentile). If a $1M portfolio has daily μ = 0% and σ = 1%, VaR at 95% confidence = 1.645% × $1M = $16,450. This means there is a 5% chance of losing more than $16,450 in a single day.
| Confidence Level | Z-Score Used | Interpretation |
|---|---|---|
| 90% | −1.282 | Loss exceeded 10% of trading days |
| 95% | −1.645 | Loss exceeded 5% of trading days |
| 99% | −2.326 | Loss exceeded 1% of trading days |
| 99.9% | −3.090 | Loss exceeded 0.1% of trading days |
<h2>Calculating Z-Scores with Sample Data</h2>
<p>When working with a sample (rather than a known population), you estimate the population parameters from the sample. The sample mean (x̄) estimates μ, and the sample standard deviation (s) estimates σ. The z-score formula remains the same: z = (x − x̄) / s.</p>
<p>However, with small samples, the resulting z-scores follow the t-distribution (not the normal distribution) due to the added uncertainty in estimating σ. The t-distribution has heavier tails, reflecting this greater uncertainty. For samples of 30 or more, the t-distribution and normal distribution are nearly identical, and z-scores from either calculation are approximately equivalent.</p>
<p>When you have a dataset and want to standardise all values (convert the entire dataset to z-scores), this is called <strong>feature scaling</strong> or <strong>standardisation</strong> in machine learning. It is a preprocessing step that puts all features on the same scale (mean = 0, SD = 1), preventing features with larger absolute values from dominating distance-based algorithms (KNN, SVM, neural networks). After standardisation, each feature's z-scores are directly comparable regardless of original units or scale.</p>
<p>To standardise a dataset in Python: <code>from sklearn.preprocessing import StandardScaler; scaler = StandardScaler(); X_scaled = scaler.fit_transform(X)</code>. In Excel: for each value in a column, compute <code>=STANDARDIZE(value, AVERAGE(range), STDEV(range))</code>.</p>