{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "[Previous Notebook](Part_2.ipynb)\n", " \n", " \n", " \n", " \n", "[Home Page](../Start_Here.ipynb)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# CNN Primer and Keras 101 - Continued \n", "\n", "This notebook covers introduction to Convolutional Neural Networks, and it's terminologies.\n", "\n", "**Contents of the this Notebook:**\n", "\n", "- [Convolution Neural Networks ( CNNs )](#Convolution-Neural-Networks-(-CNNs-))\n", "- [Why CNNs are good in Image related tasks? ](#Why-CNNs-are-good-in-Image-related-tasks?)\n", "- [Implementing Image Classification using CNN's](#Implementing-Image-Classification-using-CNN's)\n", "- [Conclusion](#Conclusion-:)\n", "\n", "\n", "**By the end of this notebook you will:**\n", "\n", "- Understand how a Convolution Neural Network works\n", "- Write your own CNN Classifier and train it." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Convolution Neural Networks ( CNNs ) \n", "\n", "Convolution Neural Networks are widely used in the field of Image Classification, Object Detection, and Face Recognition because they are very effective in reducing the number of parameters without losing on the quality of models.\n", "\n", "Let's now understand what makes up a CNN Architecture and how it works : \n", "\n", "Here is an example of a CNN Architecture for a Classification task : \n", "\n", "![alt_text](images/cnn.jpeg)\n", "\n", "*Source: https://fr.mathworks.com/solutions/deep-learning/convolutional-neural-network.html*\n", "\n", "Each input image will pass it through a series of convolution layers with filters (Kernels), pooling, fully connected layers (FC) and apply Softmax function to classify an object with probabilistic values between 0 and 1. \n", "\n", "Let us discuss in brief about the following in detail : \n", "\n", "- Convolution Layer \n", "- Strides and Padding \n", "- Pooling Layer\n", "- Fully Connected Layer \n", "\n", "#### Convolution Layer : \n", "\n", "Convolution layer is the first layer to learn features from the input by preserving the relationships between neighbouring pixels. The Kernel Size is a Hyper-parameter and can be altered according to the complexity of the problem.\n", "\n", "Now that we've discussed Kernels. Let's see how a Kernel operates on the layer.\n", "\n", "![alt_text](images/conv.gif)\n", "\n", "*Source: https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53*\n", "\n", "We have seen how the convolution operation works, and now let us now see how convolution operation is carried out with multiple layers.\n", "\n", "![alt_text](images/conv_depth.png)\n", "\n", "*Source: https://towardsdatascience.com/a-comprehensive-introduction-to-different-types-of-convolutions-in-deep-learning-669281e58215*\n", "\n", "\n", "Let us define the terms :\n", "\n", "- Hin : Height dimension of the layer\n", "- Win : Width dimension of the layer\n", "- Din : Depth of the layer\n", "- h : height of the kernel \n", "- w : width of the kernal \n", "- Dout : Number of kernels acting on the Layer \n", "\n", "Note : Din for the Layer and Kernel needs to be the same.\n", "\n", "Here the Din and Dout is also called as the number of channels of the layer. We can notice from the first image that typically the number of channels keeps increasing over the layers while the height and width keep decreasing. This is done so that the filters learn the features from the previous layers, they can also be called as feature channels.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "#### Strides and Padding \n", "\n", "Stride is the number of pixels shifts over the input matrix during convolution. When the stride is 1, then we move the filters to 1 pixel at a time. When the stride is 2, then we move the filters to 2 pixels at a time and so on. \n", "\n", "Sometimes filter do not fit perfectly on the input image. So, we have two options:\n", "- Pad the picture with zeros (zero-padding) so that it fits\n", "- Drop the part of the image where the filter did not fit. This is called valid padding which keeps only the valid part of the image.\n", "\n", "#### Pooling Layer :\n", "\n", "Pooling layers section would reduce the number of parameters when the images are too large. Spatial pooling also called subsampling or downsampling, which reduces the dimensionality of each map but retains important information. Spatial pooling can be of different types:\n", "- Max Pooling :\n", " - Max pooling is one of the common pooling used, and it takes the largest element from the rectified feature map.\n", "- Average Pooling\n", " - Taking the average of the elements is called Average pooling.\n", "- Sum Pooling\n", " - Sum of all elements in the feature map call is called as sum pooling.\n", "\n", "![alt_text](images/max_pool.png)\n", "\n", "*Source: https://www.programmersought.com/article/47163598855/*\n", "\n", "#### Fully Connected Layer :\n", "\n", "We will then flatten the output from the convolutions layers and feed into it a _Fully Connected layer_ to generate a prediction. The fully connected layer is an ANN Model whose inputs are the features of the Inputs obtained from the Convolutions Layers. \n", "\n", "These Fully Connected Layers are then trained along with the _kernels_ during the training process.\n", "\n", "We will also be comparing later between CNN's and ANN's during our example to benchmark their results on Image Classification tasks.\n", "\n", "### Transposed Convolution :\n", "\n", "When we apply our Convolution operation over an image, we find that the number of channels increase while the height and width of the image decreases, now in some cases, for different applications we will need to up-sample our images, _Transposed convolution_ helps to up sample the images from these layers.\n", "\n", "Here is an animation to Tranposed convolution: \n", "\n", "\n", "
\n", " | \n", " |