Search Methods#

  • This is a supplement material for the Machine Learning Simplified book. It sheds light on Python implementations of the topics discussed while all detailed explanations can be found in the book.

  • I also assume you know Python syntax and how it works. If you don’t, I highly recommend you to take a break and get introduced to the language before going forward with my code.

  • This material can be downloaded as a Jupyter notebook (Download button in the upper-right corner -> .ipynb) to reproduce the code and play around with it.

1. Required Libraries, Data & Variables#

Let’s import the data and have a look at it:

import pandas as pd
warnings.filterwarnings('ignore') # ignoring all warnings

data = pd.read_csv('https://github.com/5x12/themlsbook/raw/master/supplements/data/car_price.csv', delimiter=',', header=0)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In [1], line 2
      1 import pandas as pd
----> 2 warnings.filterwarnings('ignore') # ignoring all warnings
      4 data = pd.read_csv('https://github.com/5x12/themlsbook/raw/master/supplements/data/car_price.csv', delimiter=',', header=0)

NameError: name 'warnings' is not defined
data.head()
data.columns

Let’s define features \(X\) and a target variable \(y\):

data['price']=data['price'].astype('int')

X = data[['wheelbase', 
          'carlength', 
          'carwidth', 
          'carheight', 
          'curbweight', 
          'enginesize', 
          'boreratio', 
          'stroke',
          'compressionratio', 
          'horsepower', 
          'peakrpm', 
          'citympg', 
          'highwaympg']]

y = data['price']

Let’s split the data:

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)

2. Wrapper methods#

The following Search methods are examined:

  1. Step Forward Feature Selection method

  2. Step Backward Feature Selection method

  3. Recursive Feature Elimination method

2.1. Step Forward Feature Selection#

# Importing required libraries
from mlxtend.feature_selection import SequentialFeatureSelector as sfs
from sklearn.ensemble import RandomForestClassifier
# Set a model (Random Forest Classifier) to use in SFFS
model = RandomForestClassifier(n_estimators=100)

# Set step forward feature selection
sfs = sfs(model, # model (defined above) to use in SFFS
          k_features=4, # return top 4 features from the feature set X
          forward=True, # True for SFFS, False for SBFS (explained below)
          floating=False,
          verbose=2,
          scoring='accuracy', # metrics to use to estimate model's performance
          cv=2) #cross-validation=2

# Perform Step Forward Feature Selection by fitting X and y
sfs = sfs.fit(X_train, y_train)
# Return indexes the top 4 selected features
sfs.k_feature_idx_
# Return the labels of the top 4 selected features
top_forward = X.columns[list(sfs.k_feature_idx_)]
top_forward

2.2. Step Backward Feature Selection#

# Importing required libraries
from mlxtend.feature_selection import SequentialFeatureSelector as sfs
from sklearn.ensemble import RandomForestClassifier
import numpy as np
# Set a model (Random Forest Classifier) to use in SBFS
model = RandomForestClassifier(n_estimators=100)

# Set step backward feature selection
sfs = sfs(model, # model (defined above) to use in SBFS
           k_features=4, # return bottom 4 features from the feature set X
           forward=False, # False for SBFS, True for SFFS (explained above)
           floating=False, 
           verbose=2,
           scoring='r2', # metrics to use to estimate model's performance (here: R-squared)
           cv=2) #cross-validation=2

# Perform Step Backward Feature Selection by fitting X and y
sfs1 = sfs.fit(np.array(X_train), y_train)
# Return the labels of the bottom 4 selected features
top_backward = X.columns[list(sfs.k_feature_idx_)]
top_backward

2.3. Recursive Feature Elimination Method#

# Importing required libraries
from sklearn.feature_selection import RFE
from sklearn.linear_model import LinearRegression
# Set a model (Linear Regression) to use in RFEM
model = LinearRegression()

# Set step backward feature selection
rfe = RFE(model, 
          n_features_to_select=4, 
          step=1)

# Perform Step Backward Feature Selection by fitting X and y
rfe.fit(X, y)
# Return labels of the top 4 selected features
top_recursive = X.columns[rfe.support_]
print (top_recursive)
# Return labels and their scores of all features
print(dict(zip(X.columns, rfe.ranking_)))

3. Comparing Four Methods#

print('The features selected by Step Forward Feature Selection are: \n \n \t {} \n \n \n The features selected by Step Backward Feature Selection are: \n \n \t {} \n \n \n The features selected by Recursive Feature Elimination are: \n \n \t {}'.format(top_forward, top_backward, top_recursive))