Evaluation Metrics for Classification

Evaluation Metrics for Classification#

This is a supplement material for the Machine Learning Simplified book. It sheds light on Python implementations of the topics discussed while all detailed explanations can be found in the book.
I also assume you know Python syntax and how it works. If you don’t, I highly recommend you to take a break and get introduced to the language before going forward with my code.
This material can be downloaded as a Jupyter notebook (Download button in the upper-right corner -> .ipynb) to reproduce the code and play around with it.

This notebook is a supplement for Chapter 13. Model Evaluation of Machine Learning For Everyone book.

This script covers a comprehensive evaluation for both a binary classifier and a multi-class classifier.

1. Required Libraries#

This block imports all necessary libraries.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import (confusion_matrix, accuracy_score, precision_score, recall_score,
                             f1_score, fbeta_score, roc_auc_score, roc_curve, precision_recall_curve,
                             log_loss)
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

2. Model Evaluation for Binary Classifier#

Let’s first generate a Hypothetical Dataset and split it into train and test sets.

X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

Let’s now train a binary classifier using Random Forest.

model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
y_scores = model.predict_proba(X_test)[:, 1]  # Score for ROC and precision-recall

After that, we can proceed with model evaluation.

2.1. Confusion Matrix#

cm = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:\n", cm)

Confusion Matrix:
 [[107   8]
 [ 22 113]]

2.2. Calculating Accuracy, Precision, and Recall#

accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)

Accuracy: 0.88
Precision: 0.9338842975206612
Recall: 0.837037037037037

2.3. Plotting Precision-Recall Curve#

precision, recall, _ = precision_recall_curve(y_test, y_scores)
plt.figure()
plt.plot(recall, precision, marker='.')
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title('Precision-Recall curve')
plt.show()

../_images/ecdb84f7b89d1c72cce7252886abc19b848dcc30e6664552b07086497d566030.png

2.4. Calculating F1 Score, F0.5, and F2 Scores#

f1 = f1_score(y_test, y_pred)
f0_5 = fbeta_score(y_test, y_pred, beta=0.5)
f2 = fbeta_score(y_test, y_pred, beta=2)
print("F1 Score:", f1)
print("F0.5 Score:", f0_5)
print("F2 Score:", f2)

F1 Score: 0.8828125
F0.5 Score: 0.9127625201938611
F2 Score: 0.8547655068078669

2.5. Calculating ROC AUC#

roc_auc = roc_auc_score(y_test, y_scores)
print("ROC AUC:", roc_auc)

ROC AUC: 0.9337520128824477

2.6. Visualizing ROC Curve#

fpr, tpr, thresholds = roc_curve(y_test, y_scores)
plt.figure()
plt.plot(fpr, tpr, label='ROC curve (area = %0.2f)' % roc_auc)
plt.plot([0, 1], [0, 1], 'k--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic')
plt.legend(loc="lower right")
plt.show()

../_images/8d7bf7fb2d613ab6e46f1542a290a102c5ebf98c2e4d37964b6332575f084b91.png

2.7. Calculating Logarithmic Loss#

logloss = log_loss(y_test, y_scores)
print("Logarithmic Loss:", logloss)

Logarithmic Loss: 0.33230804473299996

3. Model Evaluation for Multiclass Classifier#