{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
    "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
    "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
    "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
    "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
    "&emsp;&emsp;&ensp;\n",
    "[Home Page](../Start_Here.ipynb)\n",
    "\n",
    "\n",
    "[Previous Notebook](Manipulation_of_Image_Data_and_Category_Determination_using_Text_Data.ipynb)\n",
    "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
    "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
    "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
    "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
    "[1](The_Problem_Statement.ipynb)\n",
    "[2](Approach_to_the_Problem_&_Inspecting_and_Cleaning_the_Required_Data.ipynb)\n",
    "[3](Manipulation_of_Image_Data_and_Category_Determination_using_Text_Data.ipynb)\n",
    "[4]\n",
    "[5](Competition.ipynb)\n",
    "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
    "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
    "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
    "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
    "[Next Notebook](Competition.ipynb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Tropical Cyclone Intensity Estimation using a Deep  Convolutional Neural Network - Part 3 \n",
    "\n",
    "**Contents of the this notebook:**\n",
    "\n",
    "- [Understand the drawbacks of existing solution](#Understanding-the-drawbacks)\n",
    "- [Working out the solution](#Working-out-the-solution)\n",
    "    - [Data Augmentation](#Data-Augmentation)\n",
    "- [Training the model](#Training-the-Model-with-Data-Augmentation)\n",
    "\n",
    "**By the end of this notebook participant will:**\n",
    "\n",
    "- Learn about improving the previous model.\n",
    "- Data Augmentation.\n",
    "- Tweaking Hyperparameters."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Understanding the drawbacks\n",
    "\n",
    "```python3\n",
    "Simply put, a machine learning model is only as good as the data it is fed with\n",
    "```\n",
    "We have achieved an accuracy nearly of ~85% running with 4 epochs. Now we will try to increase the accuracy by taking a closer look at the dataset and images. We can observe the following from our previous notebook : "
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {},
   "source": [
    "NC :441\n",
    "TD :4033\n",
    "TC :7948\n",
    "H1 :5340\n",
    "H2 :3150\n",
    "H3 :2441\n",
    "H4 :2114\n",
    "H5 :390"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First thing that we will notice from the category count is that the number of images per category is very un-uniform with ratios of TC: H5 **greater than 1:20**, This imbalance can bias the vision of our CNN model because predicting wrong on the minority class wouldn't impact the model a lot as the class contribution is less than 5% of the dataset.\n",
    "\n",
    "The same can be shown also by the heatmap we obtained in the previous notebook : Notice Most of Classes with higher data was predicted correctly and the minority class was more mis-predicted than the other classes \n",
    "![alt_text](images/heatmap.png)\n",
    "\n",
    "\n",
    "Let us see now how we solve that problem using data augmentation : "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Working out the solution\n",
    "\n",
    "## Data Augmentation \n",
    "\n",
    "To decrease the un-uniformity, we will be flipping and rotating images to compensate for the lack of data for class with less samples: \n",
    "\n",
    "![alt text](images/augment.png)\n",
    "\n",
    "\n",
    "We will be using OpenCV for Flipping and Image Rotations. \n",
    "\n",
    "``` python\n",
    "cv2.flip(img,0)\n",
    "cv2.flip(img,1)\n",
    "cv2.warpAffine(img, cv2.getRotationMatrix2D(center, 90, 1.0), (h, w))\n",
    "cv2.warpAffine(img, cv2.getRotationMatrix2D(center, 180, 1.0), (w, h))\n",
    "cv2.warpAffine(img, cv2.getRotationMatrix2D(center, 270, 1.0), (h, w))\n",
    "```\n",
    "\n",
    "There are other ways to counter data imbalance such as Class weightage, Oversampling, SMOTE etc.."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Training the Model with Data Augmentation \n",
    "\n",
    "\n",
    "We create a new function called `augmentation(name,category,filenames,labels,i)` and here we add more samples to Category which have imbalanced data.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import sys\n",
    "sys.path.append('/workspace/python/source_code')\n",
    "# Import Utlility functions\n",
    "from utils import * \n",
    "import os\n",
    "os.environ[\"CUDA_VISIBLE_DEVICES\"]=\"0\"\n",
    "\n",
    "\n",
    "# Define the Augmentation Function \n",
    "def augmentation(name,category,filenames,labels,i):\n",
    "    # Important Constants\n",
    "    file_path = \"Dataset/Aug/\"\n",
    "    images = []\n",
    "    (h, w) = (232,232)\n",
    "    center = (w / 2, h / 2)\n",
    "    angle90 = 90\n",
    "    angle180 = 180\n",
    "    angle270 = 270\n",
    "    scale = 1.0\n",
    "    img = load_image(name , interpolation = cv2.INTER_LINEAR)\n",
    "    \n",
    "    \n",
    "    if category == 0 :\n",
    "        images.append(cv2.flip(img,0))\n",
    "    elif category == 1 :\n",
    "        pass\n",
    "    elif category == 2 :\n",
    "        pass\n",
    "    elif category == 3 :\n",
    "        pass\n",
    "    elif category == 4 :\n",
    "        pass\n",
    "    elif category == 5 :\n",
    "        pass\n",
    "    elif category == 6 :\n",
    "        pass\n",
    "    elif category == 7 :\n",
    "        images.append(cv2.flip(img,0))\n",
    "        \n",
    "        \n",
    "    for j in range(len(images)):\n",
    "        cv2.imwrite(file_path+str(i+j)+'.jpeg',images[j])\n",
    "        filenames.append(file_path+str(i+j)+'.jpeg')\n",
    "        labels.append(category)\n",
    "    i = i + len(images)\n",
    "    return i"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### We pass this function to our `load_dataset()` function to generate these augmentations. \n",
    "\n",
    "Kindly wait for a couple of minutes while augments the images."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "filenames,labels = load_dataset(augment_fn = augmentation)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Set the Size of the Validation set\n",
    "val_filenames , val_labels = make_test_set(filenames,labels,val=0.1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Make train test set \n",
    "test = 0.1\n",
    "from sklearn.model_selection import train_test_split\n",
    "x_train, x_test, y_train, y_test = train_test_split(filenames, labels, test_size=test, random_state=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import tensorflow as tf\n",
    "y_train = tf.one_hot(y_train,depth=8)\n",
    "y_test = tf.one_hot(y_test,depth=8)\n",
    "val_labels = tf.one_hot(val_labels,depth=8)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Make Dataset compatible with Tensorflow Data Pipelining.\n",
    "train,test,val = make_dataset((x_train,y_train,128),(x_test,y_test,32),(val_filenames,val_labels,32))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# The model described in the paper :\n",
    "\n",
    "Now we will be using the model described in the paper to evaluate it's accuracy on the new dataset.\n",
    "\n",
    "![alt_text](images/model.png)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "tf.random.set_seed(1337)\n",
    "\n",
    "import tensorflow.keras\n",
    "from tensorflow.keras.models import Sequential\n",
    "from tensorflow.keras.layers import Dense, Conv2D, Flatten ,Dropout, MaxPooling2D\n",
    "from tensorflow.keras import backend as K \n",
    "\n",
    "#Reset Graphs and Create Sequential model\n",
    "\n",
    "K.clear_session()\n",
    "model = Sequential()\n",
    "#Convolution Layers\n",
    "\n",
    "model.add(Conv2D(64, kernel_size=10,strides=3, activation='relu', input_shape=(232,232,3)))\n",
    "model.add(MaxPooling2D(pool_size=(3, 3),strides=2))\n",
    "model.add(Conv2D(256, kernel_size=5,strides=1,activation='relu'))\n",
    "model.add(MaxPooling2D(pool_size=(3, 3),strides=2))\n",
    "model.add(Conv2D(288, kernel_size=3,strides=1,padding='same',activation='relu'))\n",
    "model.add(MaxPooling2D(pool_size=(2, 2),strides=1))\n",
    "model.add(Conv2D(272, kernel_size=3,strides=1,padding='same',activation='relu'))\n",
    "model.add(Conv2D(256, kernel_size=3,strides=1,activation='relu'))\n",
    "model.add(MaxPooling2D(pool_size=(3, 3),strides=2))\n",
    "model.add(Dropout(0.5))\n",
    "model.add(Flatten())\n",
    "\n",
    "#Linear Layers \n",
    "\n",
    "model.add(Dense(3584,activation='relu'))\n",
    "model.add(Dense(2048,activation='relu'))\n",
    "model.add(Dense(8, activation='softmax'))\n",
    "\n",
    "# Print Model Summary\n",
    "\n",
    "model.summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import functools\n",
    "# Include Top-2 Accuracy Metrics \n",
    "top2_acc = functools.partial(tensorflow.keras.metrics.top_k_categorical_accuracy, k=2)\n",
    "top2_acc.__name__ = 'top2_acc'\n",
    "\n",
    "#Define Number of Epochs\n",
    "epochs = 4\n",
    "\n",
    "#But Training our model from scratch will take a long time\n",
    "#So we will load a partially trained model to speedup the process \n",
    "model.load_weights(\"trained_16.h5\")\n",
    "\n",
    "# Optimizer\n",
    "sgd = tensorflow.keras.optimizers.SGD(lr=0.001, decay=1e-6, momentum=0.9)\n",
    "\n",
    "\n",
    "#Compile Model with Loss Function , Optimizer and Metrics\n",
    "model.compile(loss=tensorflow.keras.losses.categorical_crossentropy, \n",
    "              optimizer=sgd,\n",
    "              metrics=['accuracy',top2_acc])\n",
    "\n",
    "# Train the Model \n",
    "trained_model = model.fit(train,\n",
    "          epochs=epochs,\n",
    "          verbose=1,\n",
    "          validation_data=val)\n",
    "\n",
    "# Test Model Aganist Validation Set\n",
    "score = model.evaluate(test, verbose=0)\n",
    "print('Test loss:', score[0])\n",
    "print('Test accuracy:', score[1])\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Visualisations"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "f = plt.figure(figsize=(15,5))\n",
    "ax = f.add_subplot(121)\n",
    "ax.plot(trained_model.history['accuracy'])\n",
    "ax.plot(trained_model.history['val_accuracy'])\n",
    "ax.set_title('Model Accuracy')\n",
    "ax.set_ylabel('Accuracy')\n",
    "ax.set_xlabel('Epoch')\n",
    "ax.legend(['Train', 'Val'])\n",
    "\n",
    "ax2 = f.add_subplot(122)\n",
    "ax2.plot(trained_model.history['loss'])\n",
    "ax2.plot(trained_model.history['val_loss'])\n",
    "ax2.set_title('Model Loss')\n",
    "ax2.set_ylabel('Loss')\n",
    "ax2.set_xlabel('Epoch')\n",
    "ax2.legend(['Train', 'Val'],loc= 'upper left')\n",
    "\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import seaborn as sn\n",
    "from sklearn.metrics import confusion_matrix\n",
    "import pandas as pd\n",
    "\n",
    "#Plotting a heatmap using the confusion matrix\n",
    "\n",
    "pred = model.predict(val)\n",
    "p = np.argmax(pred, axis=1)\n",
    "y_valid = np.argmax(val_labels, axis=1, out=None)\n",
    "results = confusion_matrix(y_valid, p) \n",
    "classes=['NC','TD','TC','H1','H3','H3','H4','H5']\n",
    "df_cm = pd.DataFrame(results, index = [i for i in classes], columns = [i for i in classes])\n",
    "plt.figure(figsize = (15,15))\n",
    "\n",
    "sn.heatmap(df_cm, annot=True, cmap=\"Blues\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let us now save our Model and the trained weights for future usage :"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Save Our Model \n",
    "model.save('cyc_pred.h5')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Licensing\n",
    "This material is released by OpenACC-Standard.org, in collaboration with NVIDIA Corporation, under the Creative Commons Attribution 4.0 International (CC BY 4.0)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
    "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
    "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
    "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
    "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
    "&emsp;&emsp;&ensp;\n",
    "[Home Page](../Start_Here.ipynb)\n",
    "\n",
    "\n",
    "[Previous Notebook](Manipulation_of_Image_Data_and_Category_Determination_using_Text_Data.ipynb)\n",
    "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
    "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
    "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
    "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
    "[1](The_Problem_Statement.ipynb)\n",
    "[2](Approach_to_the_Problem_&_Inspecting_and_Cleaning_the_Required_Data.ipynb)\n",
    "[3](Manipulation_of_Image_Data_and_Category_Determination_using_Text_Data.ipynb)\n",
    "[4]\n",
    "[5](Competition.ipynb)\n",
    "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
    "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
    "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
    "&emsp;&emsp;&emsp;&emsp;&emsp;\n",
    "[Next Notebook](Competition.ipynb)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}