{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "5e959844",
   "metadata": {},
   "source": [
    "(chapter9_part1)=\n",
    "\n",
    "\n",
    "\n",
    "## Bagging Models\n",
    "\n",
    "- This is a supplement material for the [Machine Learning Simplified](https://themlsbook.com) book. It sheds light on Python implementations of the topics discussed while all detailed explanations can be found in the book. \n",
    "- I also assume you know Python syntax and how it works. If you don't, I highly recommend you to take a break and get introduced to the language before going forward with my code. \n",
    "- This material can be downloaded as a Jupyter notebook (Download button in the upper-right corner -> `.ipynb`) to reproduce the code and play around with it. \n",
    "\n",
    "\n",
    "This notebook is a supplement for *Chapter 9. Ensemble Models* of **Machine Learning For Everyone** book.\n",
    "\n",
    "## 1. Required Libraries, Data & Variables\n",
    "\n",
    "Let's import the data and have a look at it:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "c23451ec",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "data = {\n",
    "    'Day': list(range(1, 31)),\n",
    "    'Temperature': [\n",
    "        'Cold', 'Hot', 'Cold', 'Hot', 'Hot',\n",
    "        'Cold', 'Hot', 'Cold', 'Hot', 'Cold',\n",
    "        'Hot', 'Cold', 'Hot', 'Cold', 'Hot',\n",
    "        'Cold', 'Hot', 'Cold', 'Hot', 'Cold',\n",
    "        'Hot', 'Cold', 'Hot', 'Cold', 'Hot',\n",
    "        'Cold', 'Hot', 'Cold', 'Hot', 'Cold'\n",
    "    ],\n",
    "    'Humidity': [\n",
    "        'Normal', 'Normal', 'Normal', 'High', 'High',\n",
    "        'Normal', 'High', 'Normal', 'High', 'Normal',\n",
    "        'High', 'Normal', 'High', 'Normal', 'High',\n",
    "        'Normal', 'High', 'Normal', 'High', 'Normal',\n",
    "        'High', 'Normal', 'High', 'Normal', 'High',\n",
    "        'Normal', 'High', 'Normal', 'High', 'Normal'\n",
    "    ],\n",
    "    'Outlook': [\n",
    "        'Rain', 'Rain', 'Sunny', 'Sunny', 'Rain',\n",
    "        'Sunny', 'Rain', 'Sunny', 'Rain', 'Sunny',\n",
    "        'Rain', 'Sunny', 'Rain', 'Sunny', 'Rain',\n",
    "        'Sunny', 'Rain', 'Sunny', 'Rain', 'Sunny',\n",
    "        'Rain', 'Sunny', 'Rain', 'Sunny', 'Rain',\n",
    "        'Sunny', 'Rain', 'Sunny', 'Rain', 'Sunny'\n",
    "    ],\n",
    "    'Wind': [\n",
    "        'Strong', 'Weak', 'Weak', 'Weak', 'Weak',\n",
    "        'Strong', 'Weak', 'Weak', 'Weak', 'Strong',\n",
    "        'Weak', 'Weak', 'Strong', 'Weak', 'Weak',\n",
    "        'Weak', 'Strong', 'Weak', 'Weak', 'Weak',\n",
    "        'Strong', 'Weak', 'Weak', 'Weak', 'Weak',\n",
    "        'Strong', 'Weak', 'Weak', 'Weak', 'Strong'\n",
    "    ],\n",
    "    'Golf Played': [\n",
    "        'No', 'No', 'Yes', 'Yes', 'Yes',\n",
    "        'No', 'Yes', 'No', 'Yes', 'Yes',\n",
    "        'No', 'Yes', 'No', 'Yes', 'Yes',\n",
    "        'No', 'Yes', 'No', 'Yes', 'Yes',\n",
    "        'No', 'Yes', 'No', 'Yes', 'Yes',\n",
    "        'No', 'Yes', 'No', 'Yes', 'Yes'\n",
    "    ]\n",
    "}\n",
    "\n",
    "# Converting the dictionary into a DataFrame\n",
    "df = pd.DataFrame(data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "d7c1c4d4",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Day</th>\n",
       "      <th>Temperature</th>\n",
       "      <th>Humidity</th>\n",
       "      <th>Outlook</th>\n",
       "      <th>Wind</th>\n",
       "      <th>Golf Played</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>Cold</td>\n",
       "      <td>Normal</td>\n",
       "      <td>Rain</td>\n",
       "      <td>Strong</td>\n",
       "      <td>No</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>Hot</td>\n",
       "      <td>Normal</td>\n",
       "      <td>Rain</td>\n",
       "      <td>Weak</td>\n",
       "      <td>No</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>Cold</td>\n",
       "      <td>Normal</td>\n",
       "      <td>Sunny</td>\n",
       "      <td>Weak</td>\n",
       "      <td>Yes</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>Hot</td>\n",
       "      <td>High</td>\n",
       "      <td>Sunny</td>\n",
       "      <td>Weak</td>\n",
       "      <td>Yes</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>Hot</td>\n",
       "      <td>High</td>\n",
       "      <td>Rain</td>\n",
       "      <td>Weak</td>\n",
       "      <td>Yes</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>6</td>\n",
       "      <td>Cold</td>\n",
       "      <td>Normal</td>\n",
       "      <td>Sunny</td>\n",
       "      <td>Strong</td>\n",
       "      <td>No</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>7</td>\n",
       "      <td>Hot</td>\n",
       "      <td>High</td>\n",
       "      <td>Rain</td>\n",
       "      <td>Weak</td>\n",
       "      <td>Yes</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>8</td>\n",
       "      <td>Cold</td>\n",
       "      <td>Normal</td>\n",
       "      <td>Sunny</td>\n",
       "      <td>Weak</td>\n",
       "      <td>No</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>9</td>\n",
       "      <td>Hot</td>\n",
       "      <td>High</td>\n",
       "      <td>Rain</td>\n",
       "      <td>Weak</td>\n",
       "      <td>Yes</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>10</td>\n",
       "      <td>Cold</td>\n",
       "      <td>Normal</td>\n",
       "      <td>Sunny</td>\n",
       "      <td>Strong</td>\n",
       "      <td>Yes</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Day Temperature Humidity Outlook    Wind Golf Played\n",
       "0    1        Cold   Normal    Rain  Strong          No\n",
       "1    2         Hot   Normal    Rain    Weak          No\n",
       "2    3        Cold   Normal   Sunny    Weak         Yes\n",
       "3    4         Hot     High   Sunny    Weak         Yes\n",
       "4    5         Hot     High    Rain    Weak         Yes\n",
       "5    6        Cold   Normal   Sunny  Strong          No\n",
       "6    7         Hot     High    Rain    Weak         Yes\n",
       "7    8        Cold   Normal   Sunny    Weak          No\n",
       "8    9         Hot     High    Rain    Weak         Yes\n",
       "9   10        Cold   Normal   Sunny  Strong         Yes"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Displaying the DataFrame\n",
    "df.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3714f7f1",
   "metadata": {},
   "source": [
    "## 2. Preparation of the Dataset\n",
    "\n",
    "One-hot encoding the categorical variables"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "2ece4f7f",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/andrewwolf/.pyenv/versions/3.10.7/lib/python3.10/site-packages/sklearn/preprocessing/_encoders.py:808: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.\n",
      "  warnings.warn(\n"
     ]
    }
   ],
   "source": [
    "from sklearn.preprocessing import OneHotEncoder\n",
    "\n",
    "encoder = OneHotEncoder(sparse=False)\n",
    "encoded_features = encoder.fit_transform(df[['Temperature', 'Humidity', 'Outlook', 'Wind']])\n",
    "encoded_df = pd.DataFrame(encoded_features, columns=encoder.get_feature_names_out(['Temperature', 'Humidity', 'Outlook', 'Wind']))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "21f25d5f",
   "metadata": {},
   "source": [
    "Visualizing the first 10 records of the encoded dataframe:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "478e00f0",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Temperature_Cold</th>\n",
       "      <th>Temperature_Hot</th>\n",
       "      <th>Humidity_High</th>\n",
       "      <th>Humidity_Normal</th>\n",
       "      <th>Outlook_Rain</th>\n",
       "      <th>Outlook_Sunny</th>\n",
       "      <th>Wind_Strong</th>\n",
       "      <th>Wind_Weak</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Temperature_Cold  Temperature_Hot  Humidity_High  Humidity_Normal  \\\n",
       "0               1.0              0.0            0.0              1.0   \n",
       "1               0.0              1.0            0.0              1.0   \n",
       "2               1.0              0.0            0.0              1.0   \n",
       "3               0.0              1.0            1.0              0.0   \n",
       "4               0.0              1.0            1.0              0.0   \n",
       "5               1.0              0.0            0.0              1.0   \n",
       "6               0.0              1.0            1.0              0.0   \n",
       "7               1.0              0.0            0.0              1.0   \n",
       "8               0.0              1.0            1.0              0.0   \n",
       "9               1.0              0.0            0.0              1.0   \n",
       "\n",
       "   Outlook_Rain  Outlook_Sunny  Wind_Strong  Wind_Weak  \n",
       "0           1.0            0.0          1.0        0.0  \n",
       "1           1.0            0.0          0.0        1.0  \n",
       "2           0.0            1.0          0.0        1.0  \n",
       "3           0.0            1.0          0.0        1.0  \n",
       "4           1.0            0.0          0.0        1.0  \n",
       "5           0.0            1.0          1.0        0.0  \n",
       "6           1.0            0.0          0.0        1.0  \n",
       "7           0.0            1.0          0.0        1.0  \n",
       "8           1.0            0.0          0.0        1.0  \n",
       "9           0.0            1.0          1.0        0.0  "
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "encoded_df.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "48d01775",
   "metadata": {},
   "source": [
    "Adding the encoded features back to the dataframe"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "2f431727",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Day</th>\n",
       "      <th>Temperature</th>\n",
       "      <th>Humidity</th>\n",
       "      <th>Outlook</th>\n",
       "      <th>Wind</th>\n",
       "      <th>Golf Played</th>\n",
       "      <th>Temperature_Cold</th>\n",
       "      <th>Temperature_Hot</th>\n",
       "      <th>Humidity_High</th>\n",
       "      <th>Humidity_Normal</th>\n",
       "      <th>Outlook_Rain</th>\n",
       "      <th>Outlook_Sunny</th>\n",
       "      <th>Wind_Strong</th>\n",
       "      <th>Wind_Weak</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>Cold</td>\n",
       "      <td>Normal</td>\n",
       "      <td>Rain</td>\n",
       "      <td>Strong</td>\n",
       "      <td>No</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>Hot</td>\n",
       "      <td>Normal</td>\n",
       "      <td>Rain</td>\n",
       "      <td>Weak</td>\n",
       "      <td>No</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>Cold</td>\n",
       "      <td>Normal</td>\n",
       "      <td>Sunny</td>\n",
       "      <td>Weak</td>\n",
       "      <td>Yes</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>Hot</td>\n",
       "      <td>High</td>\n",
       "      <td>Sunny</td>\n",
       "      <td>Weak</td>\n",
       "      <td>Yes</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>Hot</td>\n",
       "      <td>High</td>\n",
       "      <td>Rain</td>\n",
       "      <td>Weak</td>\n",
       "      <td>Yes</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Day Temperature Humidity Outlook    Wind Golf Played  Temperature_Cold  \\\n",
       "0    1        Cold   Normal    Rain  Strong          No               1.0   \n",
       "1    2         Hot   Normal    Rain    Weak          No               0.0   \n",
       "2    3        Cold   Normal   Sunny    Weak         Yes               1.0   \n",
       "3    4         Hot     High   Sunny    Weak         Yes               0.0   \n",
       "4    5         Hot     High    Rain    Weak         Yes               0.0   \n",
       "\n",
       "   Temperature_Hot  Humidity_High  Humidity_Normal  Outlook_Rain  \\\n",
       "0              0.0            0.0              1.0           1.0   \n",
       "1              1.0            0.0              1.0           1.0   \n",
       "2              0.0            0.0              1.0           0.0   \n",
       "3              1.0            1.0              0.0           0.0   \n",
       "4              1.0            1.0              0.0           1.0   \n",
       "\n",
       "   Outlook_Sunny  Wind_Strong  Wind_Weak  \n",
       "0            0.0          1.0        0.0  \n",
       "1            0.0          0.0        1.0  \n",
       "2            1.0          0.0        1.0  \n",
       "3            1.0          0.0        1.0  \n",
       "4            0.0          0.0        1.0  "
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = df.join(encoded_df)\n",
    "\n",
    "df.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c34c1ba8",
   "metadata": {},
   "source": [
    "Preparing the features by removing categorical variables."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "3263541d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Temperature_Cold</th>\n",
       "      <th>Temperature_Hot</th>\n",
       "      <th>Humidity_High</th>\n",
       "      <th>Humidity_Normal</th>\n",
       "      <th>Outlook_Rain</th>\n",
       "      <th>Outlook_Sunny</th>\n",
       "      <th>Wind_Strong</th>\n",
       "      <th>Wind_Weak</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Temperature_Cold  Temperature_Hot  Humidity_High  Humidity_Normal  \\\n",
       "0               1.0              0.0            0.0              1.0   \n",
       "1               0.0              1.0            0.0              1.0   \n",
       "2               1.0              0.0            0.0              1.0   \n",
       "3               0.0              1.0            1.0              0.0   \n",
       "4               0.0              1.0            1.0              0.0   \n",
       "\n",
       "   Outlook_Rain  Outlook_Sunny  Wind_Strong  Wind_Weak  \n",
       "0           1.0            0.0          1.0        0.0  \n",
       "1           1.0            0.0          0.0        1.0  \n",
       "2           0.0            1.0          0.0        1.0  \n",
       "3           0.0            1.0          0.0        1.0  \n",
       "4           1.0            0.0          0.0        1.0  "
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X = df.drop(['Day', 'Temperature', 'Humidity', 'Outlook', 'Wind', 'Golf Played'], axis=1)\n",
    "X.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "52bd6b59",
   "metadata": {},
   "source": [
    "Defining y:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "ac05086c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      No\n",
       "1      No\n",
       "2     Yes\n",
       "3     Yes\n",
       "4     Yes\n",
       "5      No\n",
       "6     Yes\n",
       "7      No\n",
       "8     Yes\n",
       "9     Yes\n",
       "10     No\n",
       "11    Yes\n",
       "12     No\n",
       "13    Yes\n",
       "14    Yes\n",
       "15     No\n",
       "16    Yes\n",
       "17     No\n",
       "18    Yes\n",
       "19    Yes\n",
       "20     No\n",
       "21    Yes\n",
       "22     No\n",
       "23    Yes\n",
       "24    Yes\n",
       "25     No\n",
       "26    Yes\n",
       "27     No\n",
       "28    Yes\n",
       "29    Yes\n",
       "Name: Golf Played, dtype: object"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y = df['Golf Played']\n",
    "\n",
    "y"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bdd1a969",
   "metadata": {},
   "source": [
    "Splitting the dataset into training and testing sets"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "c0e9d84a",
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.model_selection import train_test_split\n",
    "\n",
    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a2197f47",
   "metadata": {},
   "source": [
    "## 3. Bagging Ensemble\n",
    "\n",
    "### 3.1. Building a Boosting Ensemble\n",
    "\n",
    "Creating the Gradient Boosting classifier"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "f2e0d664",
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.ensemble import BaggingClassifier"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "bfe562c1",
   "metadata": {},
   "outputs": [
    {
     "ename": "NameError",
     "evalue": "name 'DecisionTreeClassifier' is not defined",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mNameError\u001b[0m                                 Traceback (most recent call last)",
      "Cell \u001b[0;32mIn [10], line 4\u001b[0m\n\u001b[1;32m      1\u001b[0m \u001b[38;5;66;03m# Creating the Bagging classifier\u001b[39;00m\n\u001b[1;32m      2\u001b[0m \u001b[38;5;66;03m# Using a DecisionTreeClassifier as the base classifier\u001b[39;00m\n\u001b[1;32m      3\u001b[0m model \u001b[38;5;241m=\u001b[39m BaggingClassifier(\n\u001b[0;32m----> 4\u001b[0m                             base_estimator\u001b[38;5;241m=\u001b[39m\u001b[43mDecisionTreeClassifier\u001b[49m(), \n\u001b[1;32m      5\u001b[0m                             n_estimators\u001b[38;5;241m=\u001b[39m\u001b[38;5;241m10\u001b[39m,  \u001b[38;5;66;03m# Number of trees\u001b[39;00m\n\u001b[1;32m      6\u001b[0m                             max_samples\u001b[38;5;241m=\u001b[39m\u001b[38;5;241m0.8\u001b[39m,  \u001b[38;5;66;03m# Fraction of samples to draw from X to train each base estimator\u001b[39;00m\n\u001b[1;32m      7\u001b[0m                             max_features\u001b[38;5;241m=\u001b[39m\u001b[38;5;241m0.8\u001b[39m,  \u001b[38;5;66;03m# Fraction of features to draw from X to train each base estimator\u001b[39;00m\n\u001b[1;32m      8\u001b[0m                             random_state\u001b[38;5;241m=\u001b[39m\u001b[38;5;241m42\u001b[39m\n\u001b[1;32m      9\u001b[0m                          )\n\u001b[1;32m     10\u001b[0m model\u001b[38;5;241m.\u001b[39mfit(X_train, y_train)\n",
      "\u001b[0;31mNameError\u001b[0m: name 'DecisionTreeClassifier' is not defined"
     ]
    }
   ],
   "source": [
    "# Creating the Bagging classifier\n",
    "# Using a DecisionTreeClassifier as the base classifier\n",
    "model = BaggingClassifier(\n",
    "                            base_estimator=DecisionTreeClassifier(), \n",
    "                            n_estimators=10,  # Number of trees\n",
    "                            max_samples=0.8,  # Fraction of samples to draw from X to train each base estimator\n",
    "                            max_features=0.8,  # Fraction of features to draw from X to train each base estimator\n",
    "                            random_state=42\n",
    "                         )\n",
    "model.fit(X_train, y_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b68d36fb",
   "metadata": {},
   "source": [
    "### 3.2. Visualizing boosted ensemble"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ba349ad7",
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.tree import DecisionTreeClassifier, plot_tree"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c9649614",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Building 5 decision trees\n",
    "feature_names = encoder.get_feature_names_out(['Temperature', 'Humidity', 'Outlook', 'Wind'])\n",
    "trees = [DecisionTreeClassifier(criterion='gini', max_depth=3, random_state=42 + i) for i in range(5)]\n",
    "for tree in trees:\n",
    "    tree.fit(X_train, y_train)\n",
    "\n",
    "# Plotting all 5 trees\n",
    "fig, axes = plt.subplots(nrows=1, ncols=5, figsize=(20, 4), dpi=300)\n",
    "for i, tree in enumerate(trees):\n",
    "    plot_tree(tree, feature_names=feature_names, class_names=['No', 'Yes'], filled=True, ax=axes[i])\n",
    "    axes[i].set_title(f'Tree {i+1}')\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d8b36e47",
   "metadata": {},
   "source": [
    "### 3.3. Predicting the Results\n",
    "\n",
    "Predicting the test set results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fdb4d46d",
   "metadata": {},
   "outputs": [],
   "source": [
    "y_pred = model.predict(X_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "30b5c848",
   "metadata": {},
   "outputs": [],
   "source": [
    "y_pred"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "944a257e",
   "metadata": {},
   "source": [
    "### 3.4. Evaluating the model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "16346dd0",
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.metrics import accuracy_score, classification_report"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "11bf9725",
   "metadata": {},
   "outputs": [],
   "source": [
    "accuracy = accuracy_score(y_test, y_pred)\n",
    "report = classification_report(y_test, y_pred)\n",
    "\n",
    "print(\"Accuracy:\", accuracy)\n",
    "print(\"Classification Report:\\n\", report)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c5740b49",
   "metadata": {},
   "source": [
    "## 4. Random Forest Classifier\n",
    "\n",
    "### 4.1. Building a Boosting Ensemble\n",
    "\n",
    "Creating the Random Forest Classifier"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "14fc483f",
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.ensemble import RandomForestClassifier"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "16dd1014",
   "metadata": {},
   "outputs": [],
   "source": [
    "random_forest = RandomForestClassifier(n_estimators=100, criterion='gini', max_depth=3, random_state=42)\n",
    "\n",
    "random_forest.fit(X_train, y_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0cce2575",
   "metadata": {},
   "source": [
    "### 4.2. Predicting the Results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f95b2cb8",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Making predictions on the test set\n",
    "y_pred = random_forest.predict(X_test)\n",
    "\n",
    "y_pred"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a09cb020",
   "metadata": {},
   "source": [
    "### 4.3. Evaluating the model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "435bccda",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Evaluating the model\n",
    "accuracy = accuracy_score(y_test, y_pred)\n",
    "report = classification_report(y_test, y_pred)\n",
    "\n",
    "print(\"Accuracy:\", accuracy)\n",
    "print(\"Classification Report:\\n\", report)"
   ]
  }
 ],
 "metadata": {
  "jupytext": {
   "formats": "md:myst",
   "text_representation": {
    "extension": ".md",
    "format_name": "myst"
   }
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.7"
  },
  "source_map": [
   11,
   31,
   83,
   86,
   93,
   99,
   104,
   106,
   111,
   115,
   120,
   123,
   128,
   132,
   137,
   141,
   150,
   155,
   166,
   171,
   176,
   191,
   198,
   203,
   205,
   210,
   215,
   221,
   230,
   235,
   239,
   244,
   249,
   254
  ]
 },
 "nbformat": 4,
 "nbformat_minor": 5
}