{ "cells": [ { "cell_type": "markdown", "id": "7ccd7951", "metadata": {}, "source": [ "(chapter13_part1)=\n", "\n", "# Evaluation Metrics for Regression\n", "\n", "- This is a supplement material for the [Machine Learning Simplified](https://themlsbook.com) book. It sheds light on Python implementations of the topics discussed while all detailed explanations can be found in the book. \n", "- I also assume you know Python syntax and how it works. If you don't, I highly recommend you to take a break and get introduced to the language before going forward with my code. \n", "- This material can be downloaded as a Jupyter notebook (Download button in the upper-right corner -> `.ipynb`) to reproduce the code and play around with it.\n", "\n", "\n", "This notebook is a supplement for *Chapter 13. Model Evaluation* of **Machine Learning For Everyone** book.\n", "\n", "\n", "## 1. Required Libraries\n", "\n", "This block imports all necessary libraries." ] }, { "cell_type": "code", "execution_count": 1, "id": "34a1aebd", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "from sklearn.datasets import make_regression\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.linear_model import LinearRegression\n", "from sklearn.metrics import mean_squared_error, r2_score" ] }, { "cell_type": "markdown", "id": "89646b9d", "metadata": {}, "source": [ "## 2. Univariate Regression\n", "\n", "Generate synthetic data for Univariate Regression using `sklearn.datasets.make_regression`" ] }, { "cell_type": "code", "execution_count": 2, "id": "85f3c2c8", "metadata": {}, "outputs": [], "source": [ "# Univariate Regression: 1 feature\n", "X_uni, y_uni = make_regression(n_samples=100, n_features=1, noise=0.1, random_state=42)\n", "X_train_uni, X_test_uni, y_train_uni, y_test_uni = train_test_split(X_uni, y_uni, test_size=0.2, random_state=42)" ] }, { "cell_type": "markdown", "id": "7fb0ceb8", "metadata": {}, "source": [ "Next step is to train a regression model (`LinearRegression`)" ] }, { "cell_type": "code", "execution_count": 3, "id": "acb5fd51", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
LinearRegression()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" ], "text/plain": [ "LinearRegression()" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Univariate Model\n", "model_uni = LinearRegression()\n", "model_uni.fit(X_train_uni, y_train_uni)" ] }, { "cell_type": "markdown", "id": "51be7ff0", "metadata": {}, "source": [ "Finally, to evaluate the model, we calculate the predicted values made by a model." ] }, { "cell_type": "code", "execution_count": 4, "id": "e46352ca", "metadata": {}, "outputs": [], "source": [ "# Predictions\n", "y_pred_uni = model_uni.predict(X_test_uni)" ] }, { "cell_type": "markdown", "id": "b76cd4f4", "metadata": {}, "source": [ "### 2.1. Mean Squated Error" ] }, { "cell_type": "code", "execution_count": 5, "id": "7d3657e6", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Mean Squared Error (MSE): 0.010420222653186971\n" ] } ], "source": [ "mse_uni = mean_squared_error(y_test_uni, y_pred_uni)\n", "\n", "print(\"Mean Squared Error (MSE):\", mse_uni)" ] }, { "cell_type": "markdown", "id": "7984f2be", "metadata": {}, "source": [ "### 2.2. R-squared" ] }, { "cell_type": "code", "execution_count": 6, "id": "2686731d", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "R-squared (R²): 0.9999925261586983\n" ] } ], "source": [ "r2_uni = r2_score(y_test_uni, y_pred_uni)\n", "\n", "print(\"R-squared (R²):\", r2_uni)" ] }, { "cell_type": "markdown", "id": "ba90ae68", "metadata": {}, "source": [ "## 3. Multivariate Regression\n", "\n", "Generate synthetic data for Univariate Regression using `sklearn.datasets.make_regression`" ] }, { "cell_type": "code", "execution_count": 7, "id": "4ffa6593", "metadata": {}, "outputs": [], "source": [ "# Multivariate Regression: 3 features\n", "X_multi, y_multi = make_regression(n_samples=100, n_features=3, noise=0.1, random_state=42)\n", "X_train_multi, X_test_multi, y_train_multi, y_test_multi = train_test_split(X_multi, y_multi, test_size=0.2, random_state=42)" ] }, { "cell_type": "markdown", "id": "450c23cd", "metadata": {}, "source": [ "Next step is to train a regression model (`LinearRegression`)" ] }, { "cell_type": "code", "execution_count": 8, "id": "a5531057", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
LinearRegression()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" ], "text/plain": [ "LinearRegression()" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Multivariate Model\n", "model_multi = LinearRegression()\n", "model_multi.fit(X_train_multi, y_train_multi)" ] }, { "cell_type": "markdown", "id": "927722ea", "metadata": {}, "source": [ "Finally, to evaluate the model, we calculate the predicted values made by a model." ] }, { "cell_type": "code", "execution_count": 9, "id": "cbefe62a", "metadata": {}, "outputs": [], "source": [ "# Predictions\n", "y_pred_multi = model_multi.predict(X_test_multi)" ] }, { "cell_type": "markdown", "id": "204f561f", "metadata": {}, "source": [ "### 3.1. MSE and R-squared" ] }, { "cell_type": "code", "execution_count": 10, "id": "b87b1925", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Mean Squared Error (MSE): 0.012384680824798728\n", "R-squared (R²): 0.9999982803305351\n" ] } ], "source": [ "# MSE and R-squared\n", "mse_multi = mean_squared_error(y_test_multi, y_pred_multi)\n", "r2_multi = r2_score(y_test_multi, y_pred_multi)\n", "\n", "print(\"Mean Squared Error (MSE):\", mse_multi)\n", "print(\"R-squared (R²):\", r2_multi)" ] }, { "cell_type": "markdown", "id": "71c17243", "metadata": {}, "source": [ "### 3.2. Adjusted R-squared" ] }, { "cell_type": "code", "execution_count": 11, "id": "e2eec01e", "metadata": {}, "outputs": [], "source": [ "# Adjusted R-squared\n", "n = len(y_test_multi) # number of data points\n", "p = X_test_multi.shape[1] # number of predictors\n", "adj_r2_multi = 1 - (1 - r2_multi) * ((n - 1) / (n - p - 1))" ] }, { "cell_type": "code", "execution_count": 12, "id": "baaad849", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Adjusted R-squared: 0.9999979578925103\n" ] } ], "source": [ "print(\"Adjusted R-squared:\", adj_r2_multi)" ] } ], "metadata": { "jupytext": { "formats": "md:myst", "text_representation": { "extension": ".md", "format_name": "myst" } }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.7" }, "source_map": [ 11, 30, 36, 43, 47, 52, 56, 61, 64, 69, 73, 78, 82, 89, 93, 98, 102, 107, 110, 115, 122, 127, 135 ] }, "nbformat": 4, "nbformat_minor": 5 }