Evaluation Metrics for Regression#

This notebook is a supplement for Chapter 13. Model Evaluation of Machine Learning For Everyone book.

1. Required Libraries#

This block imports all necessary libraries.

import numpy as np
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

2. Univariate Regression#

Generate synthetic data for Univariate Regression using sklearn.datasets.make_regression

# Univariate Regression: 1 feature
X_uni, y_uni = make_regression(n_samples=100, n_features=1, noise=0.1, random_state=42)
X_train_uni, X_test_uni, y_train_uni, y_test_uni = train_test_split(X_uni, y_uni, test_size=0.2, random_state=42)

Next step is to train a regression model (LinearRegression)

# Univariate Model
model_uni = LinearRegression()
model_uni.fit(X_train_uni, y_train_uni)
Finally, to evaluate the model, we calculate the predicted values made by a model.

# Predictions
y_pred_uni = model_uni.predict(X_test_uni)

2.1. Mean Squated Error#

mse_uni = mean_squared_error(y_test_uni, y_pred_uni)

print("Mean Squared Error (MSE):", mse_uni)
Mean Squared Error (MSE): 0.010420222653186971

2.2. R-squared#

r2_uni = r2_score(y_test_uni, y_pred_uni)

print("R-squared (R²):", r2_uni)
R-squared (R²): 0.9999925261586983

3. Multivariate Regression#

# Multivariate Regression: 3 features
X_multi, y_multi = make_regression(n_samples=100, n_features=3, noise=0.1, random_state=42)
X_train_multi, X_test_multi, y_train_multi, y_test_multi = train_test_split(X_multi, y_multi, test_size=0.2, random_state=42)

# Multivariate Model
model_multi = LinearRegression()
model_multi.fit(X_train_multi, y_train_multi)
# Predictions
y_pred_multi = model_multi.predict(X_test_multi)

3.1. MSE and R-squared#

# MSE and R-squared
mse_multi = mean_squared_error(y_test_multi, y_pred_multi)
r2_multi = r2_score(y_test_multi, y_pred_multi)

print("Mean Squared Error (MSE):", mse_multi)
print("R-squared (R²):", r2_multi)
Mean Squared Error (MSE): 0.012384680824798728
R-squared (R²): 0.9999982803305351

3.2. Adjusted R-squared#

# Adjusted R-squared
n = len(y_test_multi)  # number of data points
p = X_test_multi.shape[1]  # number of predictors
adj_r2_multi = 1 - (1 - r2_multi) * ((n - 1) / (n - p - 1))
print("Adjusted R-squared:", adj_r2_multi)
Adjusted R-squared: 0.9999979578925103