Easy Implementation of CNN with MNIST data

 

A deep neural network is used for handling large datasets. We have many kinds of ANN for specific tasks like using LSTM for NLP. Similarly, we have a neural network that works for images.

The AI system, which became known as AlexNet (named after its main creator, Alex Krizhevsky), won the 2012 ImageNet computer vision contest with an amazing 85 per cent accuracy. The runner-up scored a modest 74 per cent on the test.

At the heart of AlexNet were Convolutional Neural Networks a particular type of neural network that roughly imitates human vision. Over the years CNNs have become a very important part of many Computer Vision applications. So let’s take a look at the workings of CNNs.

HUMAN VISUAL AND CNN

The idea of CNN was neurobiological motivated by the findings of locally sensitive and orientation-selective nerve cells in the visual cortex. The inventors of CNN designed a network structure that implicitly extracts relevant features. They are a special kind of neural network.

In mathematics convolution is a mathematical operation on two functions that produces a third function that expresses how the shape of one is modified by the other.

The core idea behind CNN:

  1. Local Connections: Represent how each set of neurons in a cluster is disconnected from each other, which in turn represents a set of features.
  2. Layering: Represents the hierarchy of features that are learned.
  3. Spatial Invariance: This represents the capability of CNN to learn abstractions invariant of size, contrast, rotation and variation.

Some famous CNN are:

  • LeNet, 1998
  • AlexNet,2012
  • VGGNet, 2014
  • ResNet, 2015

So with this, we are done with history of the CNN and went through some famous CNNs. Let's get down to it's working.

Working of CNN

Lets understand what is image first. The image is a matrix with pixel values. If an image is gray scale it has only one plane with pixel values 0 or 1. If an image is in RGB, then it will have three colour channels. It means it will have 3 planes.

CNN works of specific details rather than the whole image. Its covinent and effective to represent a smaller region with fewer parameters, thereby reducing reducing computational complexity.

CNN have what we call as convolution layer. It works as a filter above the image, applying which we get a convolved feature. This feature is passed on to next layer.

Filters can be considered as network parameters to be learned. If you change the stride size, the convoluted output will vary(only outputting intense pixels). When an RGB image is used as input to CNN, the depth of the filter is always equal to the depth of the image (3 in the case of RGB and 1 in grayscale).

What is Pooling Layer?

The pooling layer gradually reduces the spatial size of each matrix within the feature map such that the amount of parameters and computation is reduced in the network. The most common used pooling approach is max pooling.

                                                                    Pooling layer

The CNN Architecture compromises multiple combinations of convolution and pooling layers. Resultant image is smaller than the original image. the reduced image from these layers (convolution+pooling) is then passed through the activation function.

                                                                                      Working of the CNN

So with this we are done with theory. There are other things for you to ponder upon like deep CNN or how these networks are different from fully connected ANN?

Implementation of CNN from scratch on MNIST

Lets start with importing basic libraries

import tensorflow as tf
import numpy as np
from tensorflow import keras
import matplotlib.pyplot as plt
from tensorflow.keras.utils import to_categorical
import pandas as pd
import datetime
%matplotlib inline

I am implementing this on Kaggle so will upload their dataset but you guys can get MNIST dataset from Keras

from keras.datasets import mnist
data = mnist.load_data()
(train_X, train_y), (test_X, test_y) = mnist.load_data()
# You can go ahead and print the image using imshow()
plt.imshow(train_X[0])

We will convert the target variable to categorical using one hot encoding. We have 10 classes. to_categorical() will work for you. We will convert integers to float and normalize the data by dividing all values by 255.0. Why 255? That is something for you to find out.

Now we are done with preparing our feature and target. We will begin now with our baseline model. Our baseline model will have two parts:

  1. Base CNN model with one Conv2D and one pooling layer.
  2. Another part is a classifier to predict the class.
model= keras.Sequential([
#This part is a CNN with 32 filters of size (3,3). Image size is (28,28) and 1 for grayscale. This will do feature extraction.
keras.layers.Conv2D(32,(3,3),activation='relu', input_shape =(28,28,1)),
keras.layers.MaxPooling2D((2,2)),
keras.layers.Flatten(),
#This layer is classifer which will do prediction   
keras.layers.Dense(100,activation='relu'),
keras.layers.Dense(10,activation='softmax')
])
log_dir = "logs/fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
#tb_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1)
model.compile(optimizer='SGD',
loss='CategoricalCrossentropy',
metrics=['accuracy']
)
history=model.fit(feature_new,target_new, epochs=15)

This baseline is giving us 98.20% accuracy. Now we will tinker with our baseline model.

First thing we will use is BatchNormalization(). Why BatchNormalization?

BatchNormalization is used to perform the standardizing and normalizing operations on the input of a layer coming from a previous layer. A typical neural network is trained using a collected set of input data called a batch. Similarly, the normalizing process in a batch normalization takes place in batches, not as a single input.

So let's implement it with our baseline and check if it makes any difference.

model_batch= keras.Sequential([ 
keras.layers.Conv2D(32,(3,3),activation='relu', input_shape=(28,28,1)),
keras.layers.BatchNormalization(),
keras.layers.MaxPooling2D((2,2)),
keras.layers.Flatten(),
keras.layers.Dense(100,activation='relu'),
keras.layers.BatchNormalization(),
keras.layers.Dense(10,activation='softmax')
])
log_dir = "logs/fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
model_batch.compile(optimizer='SGD',
loss='CategoricalCrossentropy',
metrics=['accuracy'])
history=model_batch.fit(feature_new,target_new, epochs=15)

We can see our accuracy increased tremendously. We are now at 99.90%

We will try to implement VGG-like pattern and see what our results will be. For that architecture is something like

  • Input layer
  • Batch Normalization
  • 2 layers of Conv2D
  • 1 layer of pooling
  • and our classifier
#VGG like pattern
model_vg= keras.Sequential([
keras.layers.Conv2D(32,(3,3),activation='relu', input_shape=(28,28,1)),
keras.layers.BatchNormalization(),
keras.layers.Conv2D(64,(3,3), activation='relu'),
keras.layers.Conv2D(64,(3,3), activation='relu'),
keras.layers.MaxPooling2D((2,2)),
keras.layers.Flatten(),
keras.layers.Dense(100,activation='relu'),
keras.layers.BatchNormalization(),
keras.layers.Dense(10,activation='softmax')
])

So we can see a slight improvement in accuracy with 99.94%

so with this we have learnt so many things today. We learnt about CNN , and its history and went through and its implementation. There is so much more about CNN , I mean there are literally books written on one topic. But this article will give you a start on CNN. I hope this article gave a basic start point for computer vision.

Thank you all for reading my article.

Follow me on @NancyPandey

My notebook is on Kaggle please visit here and give it a like.

Happy coding and keep learning :)

Comments

Popular Posts