Descriptive statistics provide a concise summary of a dataset, enabling quick insights into its central tendency, dispersion, and shape. Mastering these calculations by hand is fundamental for understanding the underlying principles before leveraging computational tools.

This guide will walk you through the manual computation of the most common descriptive statistics, ensuring a robust comprehension of each metric.

Prerequisites

To effectively follow this guide, a foundational understanding of basic arithmetic operations (addition, subtraction, multiplication, division, square roots) and the concept of ordering numerical data is required.

Key Descriptive Statistics

Mean (Arithmetic Mean)

The mean, often denoted as $\bar{x}$ (for a sample) or $\mu$ (for a population), is the sum of all values divided by the count of values. It represents the 'average' value in the dataset.

Formula:

$$ \bar{x} = \frac{\sum_{i=1}^{n} x_i}{n} $$

Where:

$\sum x_i$ is the sum of all data points.
$n$ is the number of data points.

Median

The median is the middle value of a dataset when it is ordered from least to greatest. It is less affected by extreme outliers than the mean.

Procedure:

Order the dataset from smallest to largest.
If the number of data points ($n$) is odd, the median is the value exactly in the middle.
If $n$ is even, the median is the average of the two middle values.

Mode

The mode is the value that appears most frequently in a dataset. A dataset can have one mode (unimodal), multiple modes (multimodal), or no mode (if all values appear with the same frequency).

Procedure:

Count the frequency of each unique value in the dataset.
Identify the value(s) with the highest frequency.

Variance

Variance measures the average squared deviation of each data point from the mean. It quantifies the spread of the data. There are distinct formulas for population variance ($\sigma^2$) and sample variance ($s^2$).

Population Variance Formula:

$$ \sigma^2 = \frac{\sum_{i=1}^{N} (x_i - \mu)^2}{N} $$

Sample Variance Formula:

$$ s^2 = \frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n-1} $$

Where:

$x_i$ is each individual data point.
$\mu$ (population) or $\bar{x}$ (sample) is the mean.
$N$ (population) or $n$ (sample) is the number of data points.
The denominator $n-1$ for sample variance is known as Bessel's correction, used to provide an unbiased estimate of the population variance.

Standard Deviation

The standard deviation is the square root of the variance. It measures the typical distance between a data point and the mean, expressed in the same units as the data itself, making it more interpretable than variance.

Population Standard Deviation Formula:

$$ \sigma = \sqrt{\sigma^2} = \sqrt{\frac{\sum_{i=1}^{N} (x_i - \mu)^2}{N}} $$

Sample Standard Deviation Formula:

$$ s = \sqrt{s^2} = \sqrt{\frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n-1}} $$

Percentiles

Percentiles indicate the value below which a given percentage of observations fall. For example, the 25th percentile (Q1) is the value below which 25% of the data lies.

Procedure (for the Pth percentile):

Order the dataset from smallest to largest.
Calculate the rank (position) $L = (P/100) \times n$.
If $L$ is an integer: The Pth percentile is the average of the value at position $L$ and the value at position $L+1$.
If $L$ is not an integer: Round $L$ up to the nearest whole number. The Pth percentile is the value at this new position.

Worked Example: Comprehensive Calculation

Let's calculate all descriptive statistics for the following dataset representing scores in an exam:

Dataset: $X = [12, 15, 18, 20, 20, 22, 25, 28, 30, 30]$

Number of data points ($n$): 10

Step 1: Order the Data

The dataset is already ordered for convenience: $[12, 15, 18, 20, 20, 22, 25, 28, 30, 30]$

Step 2: Calculate Measures of Central Tendency

Mean ($\bar{x}$):
- Sum of values: $12 + 15 + 18 + 20 + 20 + 22 + 25 + 28 + 30 + 30 = 220$
- $\bar{x} = 220 / 10 = 22$
Median:
- Since $n=10$ (even), the median is the average of the 5th and 6th values.
- 5th value = 20, 6th value = 22
- Median = $(20 + 22) / 2 = 21$
Mode:
- Values 20 and 30 both appear twice, which is more frequent than any other value.
- Modes = 20, 30 (Bimodal)

Step 3: Calculate Measures of Dispersion (Sample Statistics)

Variance ($s^2$):
- First, calculate $(x_i - \bar{x})$ for each point:
  - $12-22 = -10$
  - $15-22 = -7$
  - $18-22 = -4$
  - $20-22 = -2$
  - $20-22 = -2$
  - $22-22 = 0$
  - $25-22 = 3$
  - $28-22 = 6$
  - $30-22 = 8$
  - $30-22 = 8$
- Next, square each deviation $(x_i - \bar{x})^2$:
  - $(-10)^2 = 100$
  - $(-7)^2 = 49$
  - $(-4)^2 = 16$
  - $(-2)^2 = 4$
  - $(-2)^2 = 4$
  - $(0)^2 = 0$
  - $(3)^2 = 9$
  - $(6)^2 = 36$
  - $(8)^2 = 64$
  - $(8)^2 = 64$
- Sum of squared deviations: $100 + 49 + 16 + 4 + 4 + 0 + 9 + 36 + 64 + 64 = 346$
- $s^2 = \frac{346}{n-1} = \frac{346}{10-1} = \frac{346}{9} \approx 38.44$
Standard Deviation ($s$):
- $s = \sqrt{s^2} = \sqrt{38.44} \approx 6.20$

Step 4: Calculate Percentiles

Let's find the 25th percentile (Q1) and 75th percentile (Q3).

25th Percentile (Q1):
- $L = (25/100) \times 10 = 2.5$
- Since $L$ is not an integer, round up to 3. The 3rd value in the ordered dataset is 18.
- Q1 = 18
75th Percentile (Q3):
- $L = (75/100) \times 10 = 7.5$
- Since $L$ is not an integer, round up to 8. The 8th value in the ordered dataset is 28.
- Q3 = 28

Common Pitfalls

Misidentifying Population vs. Sample: Always use the correct denominator ($N$ or $n-1$) for variance and standard deviation based on whether your data represents an entire population or a sample from it.
Failing to Order Data: For median and percentiles, the data must be sorted in ascending order. Incorrect ordering will lead to erroneous results.
Rounding Errors: When performing manual calculations, avoid premature rounding, especially during intermediate steps for variance and standard deviation. Keep as many decimal places as feasible until the final result.
Interpreting Mode: If a dataset has multiple modes, report all of them. If all values appear with the same frequency, there is no mode, or sometimes it's reported as 'no distinct mode'.

When to Use a Calculator or Software

While manual calculation is crucial for conceptual understanding, for large datasets (e.g., $n > 30$), complex calculations (e.g., many percentiles, non-integer ranks), or when high precision is required, using a calculator or statistical software (like Python, R, Excel) is highly recommended. These tools automate the process, minimize human error, and handle computational complexities efficiently, allowing you to focus on data interpretation rather than tedious arithmetic.

By following these steps, you can accurately derive the fundamental descriptive statistics for any given dataset, providing a solid foundation for further statistical analysis.

How to Calculate Descriptive Statistics: Step-by-Step Guide

Step-by-Step Instructions

Gather and Order Your Dataset

Calculate Measures of Central Tendency

Determine Measures of Dispersion (Variance and Standard Deviation)

Compute Percentiles