The human brain has specialised visual, auditory and other sensory modules. We want to mimic these specialised modules using Artificial Neurons and want to create what is known as Artificial Intelligence. The Convolutional Neural Network or CNN helps machines to recognise visual patterns, objects, images etc.
CNNs emerged from the study of the brain’s visual cortex and have been used in various Image Recognition tasks. In this blog, we will talk about the Convolutional Neural Networks and try to solve an Image Classification task using these CNNs.
2. The need for CNN
Before jumping on to the concepts of the convolutional layers, I wanted to talk about why do we need CNNs in the first place, why can’t we solve these Image Recognition tasks using the good ol’ ANNs. There are 2 major reasons for this,
a. One reason is that in an Artificial Neural Network, each neuron in a layer is connected to every other neuron in the following layer. This results in large number of model parameters. This can be hard to train if we have limited computing power and large parameters can also lead to overfittingas well.
b. The second reason is that the ANNs are not Invariant. Now what do I mean by this? For example, I have an image in my training set and I train the ANN model on this training set. Now, in my validation set if I come across a rotated/translated/scaled/reduced version of the same image my ANN model will not be able to recognise it.
The Convolutional Neural Network is able to deal with both of these shortcomings of the Artificial Neural Network or the Dense Neural Network. Let’s learn about the CNNs in the next section.
3. Convolutional Layers
The Convolutional Layers are the most important building blocks of a CNN. The neurons in a convolutional layer are not connected with every other neuron in the following convolutional layer, but only to pixels within their receptive fields.
The local receptive field is a defined segmented area that is occupied by the content of input data that a neuron within a convolutional layer is exposed to during the process of convolution.
The small rectangles in the above image show the local receptive fields of the neurons. Another advantage of this kind of architecture is that it allows the network to focus on small low-level features in the initial hidden layers and then concatenate up to the larger high-level features. For example, let us assume that we have an image of a human body then, the initial hidden layers will focus on the low level features like lines, circles, curves etc. The hidden layers in the middle will try to recognise shapes that are made up of the low level features. Finally, the high level features will recognise the entire objects like face, arms, legs etc. in the human body. This hierarchical structure can be observed in the real world images as seen in the above example. Hence, CNNs work so well in the Image Recognition tasks.
A convolution operation between 2 matrices is can be calculated by multiplying the corresponding elements and then adding them up. In case of CNNs the convolution operation is performed between the image and the filters. The filter is defined as the neurons represented as a small image the size of the receptive field.
The above image explains the convolution operation. It is possible to connect a large input layer with a smaller layer by spacing out the filters/receptive fields. This spacing out can be done by taking longer Strides.
Enough of the theory part, now let’s move on to the application of the Convolutional Neural Networks.
4. Vegetable Image Classification using CNN
The problem statement here is,
Given the images for 15 different vegetables. Develop an Image Classification model that correctly detects and classifies the images of vegetables to their corresponding labels.
You can find the dataset here.
Let’s take a look at the images we have along with their labels.
The images look pretty clear along with the labels. The next step before modelling is to preprocess the images for that I have used kerasImageDataGenerator. The following code snippet demonstrates the use of ImageDataGenerator,
We have 15000 images in the trainset belonging to 15 different classes (1000 images for each class) and 3000 images for 15 classes (200 images for each class) in the validation and the test set. Next, I have built a custom CNN model using the Keras API.
The above model comprises of the following:
- 2 Convolutional Layers with 32 filters of size 3X3 and relu activation function and with 64 filters of size 3X3 and relu activation function respectively.
- 2 MaxPoolingLayers with a pool size of 2.
- 1 Flatten layer to flatten the 3 Dimensional feature volume.
- 2 Dense Layers (Fully-Connected) with 128 neurons and reluactivation function.
- 1 Dropout layer with a dropout rate of 25%.
- The final output layer with 15 neurons and softmax activation function which outputs the probability for each class.
You can read about all these layers in the Keras Documentation, here. The model is compiled and trained for 100 epochs with the following hyperparameters.
The model trains for 14 epochs before early stopping. The following plot shows the training and validation loss along with accuracy on both the sets.
We are able to achieve 95% accuracy on the test set.
Let’s test the model’s predictions. The following function takes into the path of the image from the directory and the actual label and shows the image with both actual label and the predicted labels.
As we can see the generated predictions are correct. In my next blog, I will implement the concept of transfer learning on the same dataset and compare the results of both the models.