Skip to main content
Skip to main content
DigiCalcs
Back to Guides
6 min read4 Steps

How to Calculate R-squared: Step-by-Step Guide

Learn to manually calculate R-squared (coefficient of determination) step-by-step. Understand the formula, variables, and interpret model fit.

Skip the math — use the calculator

Step-by-Step Instructions

1

Gather Your Data and Calculate the Mean of Observed Values

First, identify all observed (y_i) and predicted (ŷ_i) values from your regression analysis. Then, sum all observed y_i values and divide by the total number of data points (n) to calculate the mean of observed values (ȳ).

2

Calculate the Sum of Squares of Residuals (SS_res)

For each data point, subtract the predicted value (ŷ_i) from the observed value (y_i), then square the result. Sum all these squared differences to obtain the Sum of Squares of Residuals (SS_res). This represents the unexplained variance.

3

Calculate the Total Sum of Squares (SS_tot)

For each data point, subtract the mean of observed values (ȳ) from the observed value (y_i), then square the result. Sum all these squared differences to obtain the Total Sum of Squares (SS_tot). This represents the total variance in the dependent variable.

4

Compute the R-squared Value

Finally, apply the R-squared formula: R² = 1 - (SS_res / SS_tot). Substitute the calculated SS_res and SS_tot values into the formula to determine your R-squared value, which quantifies your model's goodness of fit.

How to Calculate R-squared: Step-by-Step Guide

The R-squared (R²) value, also known as the coefficient of determination, is a statistical measure that represents the proportion of the variance in the dependent variable that can be explained by the independent variables in a regression model. It is a key indicator of how well your model fits the observed data, with values ranging from 0 to 1. A higher R-squared value generally indicates a better fit.

This guide will walk you through the manual calculation of R-squared, detailing each component of the formula and providing a practical example.

Prerequisites

Before you begin, ensure you have the following data points from your regression analysis:

  • Observed Values (y_i): The actual measured values of the dependent variable.
  • Predicted Values (ŷ_i): The values of the dependent variable predicted by your regression model.
  • Mean of Observed Values (ȳ): The average of all observed values of the dependent variable.

Understanding the R-squared Formula

The fundamental formula for R-squared is:

$$ R^2 = 1 - \frac{\text{SS_res}}{\text{SS_tot}} $$

Where:

  • SS_res (Sum of Squares of Residuals): This measures the total squared differences between the observed values and the values predicted by your model. It quantifies the unexplained variance by the model. $$ \text{SS_res} = \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 $$

  • SS_tot (Total Sum of Squares): This measures the total squared differences between the observed values and their mean. It represents the total variance in the dependent variable. $$ \text{SS_tot} = \sum_{i=1}^{n} (y_i - \bar{y})^2 $$

Variable Legend:

  • n: The number of data points.
  • y_i: The i-th observed value of the dependent variable.
  • ŷ_i (y-hat_i): The i-th predicted value of the dependent variable from the regression model.
  • ȳ (y-bar): The mean of all observed values of the dependent variable.

Geometric Interpretation

Conceptually, R-squared compares the variance around the regression line (SS_res) to the total variance around the mean (SS_tot). Imagine a scatter plot of your data points. SS_tot represents how much the observed data points vary from a horizontal line at the mean of Y. SS_res represents how much the observed data points vary from the regression line. If your regression line perfectly explains the variance, SS_res would be 0, and R-squared would be 1. If your regression line explains no more variance than simply using the mean, SS_res would be equal to SS_tot, and R-squared would be 0.

Worked Example

Let's calculate R-squared for a simple dataset. Suppose we have the following observed (y_i) and predicted (ŷ_i) values from a regression model:

Data Point (i) Observed (y_i) Predicted (ŷ_i)
1 6 5.5
2 8 7.8
3 10 10.1
4 12 12.3
5 14 13.9

Step 1: Calculate the Mean of Observed Values (ȳ)

First, sum all observed y_i values and divide by the number of data points (n=5).

$$ \bar{y} = \frac{6 + 8 + 10 + 12 + 14}{5} = \frac{50}{5} = 10 $$

So, ȳ = 10.

Step 2: Calculate the Sum of Squares of Residuals (SS_res)

Now, we'll calculate (y_i - ŷ_i)² for each data point and sum them up.

  • (6 - 5.5)² = (0.5)² = 0.25
  • (8 - 7.8)² = (0.2)² = 0.04
  • (10 - 10.1)² = (-0.1)² = 0.01
  • (12 - 12.3)² = (-0.3)² = 0.09
  • (14 - 13.9)² = (0.1)² = 0.01

$$ \text{SS_res} = 0.25 + 0.04 + 0.01 + 0.09 + 0.01 = 0.40 $$

Step 3: Calculate the Total Sum of Squares (SS_tot)

Next, we'll calculate (y_i - ȳ)² for each data point and sum them up.

  • (6 - 10)² = (-4)² = 16
  • (8 - 10)² = (-2)² = 4
  • (10 - 10)² = (0)² = 0
  • (12 - 10)² = (2)² = 4
  • (14 - 10)² = (4)² = 16

$$ \text{SS_tot} = 16 + 4 + 0 + 4 + 16 = 40 $$

Step 4: Compute R-squared

Finally, plug SS_res and SS_tot into the R-squared formula:

$$ R^2 = 1 - \frac{\text{SS_res}}{\text{SS_tot}} = 1 - \frac{0.40}{40} = 1 - 0.01 = 0.99 $$

For this example, R-squared is 0.99. This indicates that 99% of the variance in the observed 'y' values can be explained by our regression model, suggesting an excellent fit.

Common Pitfalls and Considerations

  • Negative R-squared: While R-squared typically ranges from 0 to 1, it can be negative if your model is worse than simply predicting the mean of the dependent variable. This usually occurs when the model was not fit using ordinary least squares (OLS) regression or if the model is poorly specified.
  • Correlation vs. R-squared: For simple linear regression, R-squared is the square of the Pearson correlation coefficient (r²). However, for multiple regression, R-squared is not simply the square of a single correlation coefficient.
  • Over-reliance: A high R-squared does not necessarily imply that the model is correct, nor does it guarantee predictive accuracy. Always consider other diagnostic plots (e.g., residual plots) and domain knowledge.
  • Adjusted R-squared: For multiple regression, Adjusted R-squared accounts for the number of predictors in the model and is often preferred as it penalizes for adding non-significant independent variables.

When to Use an R-squared Calculator

Manually calculating R-squared is crucial for understanding its underlying mechanics. However, for larger datasets, models with many predictors, or when performing iterative model adjustments, manual calculation becomes time-consuming and prone to error. In such scenarios, utilizing an R-squared calculator or statistical software is highly recommended for efficiency and accuracy. These tools can process complex datasets rapidly, allowing you to focus on model interpretation and refinement.

Conclusion

Calculating R-squared by hand reinforces your understanding of how this vital statistic measures the explanatory power of your regression model. By following these steps, you can confidently determine the proportion of variance in your dependent variable explained by your model, providing valuable insight into its goodness of fit.

Ready to Calculate?

Skip the manual work and get instant results.

Open Calculator

Settings

PrivacyTermsAbout© 2026 DigiCalcs