Learn Image Classification with Deep Neural Network using Keras

Learn Image Classification with Deep Neural Network using Keras

Introduction

In this article, we will learn image classification with Keras using deep learning. We will not use the convolutional neural network but just a simple deep neural network which will still show very good accuracy. For this purpose, we will use the MNIST handwritten digits dataset which is often considered as the Hello World of deep learning tutorials. And since deep learning models are trained fast on GPUs, we will use Google Colab for building our model.

Before we do the actual hands-on, let us first understand MNIST dataset

MNIST Handwritten Digit Dataset

MNIST Handwritten Digit Dataset
MNIST Handwritten Digit Dataset
  • The MNIST handwritten digit classification problem is a standard dataset used in computer vision and deep learning tasks. It is a dataset of 60,000 small square 28×28 pixel grayscale images of handwritten single digits between 0 and 9.
  • The task is to classify a given image of a handwritten digit into one of 10 classes representing integer values from 0 to 9.
  • I have downloaded the dataset from Kaggle to show how you can use your own dataset to trained your model.
  • In Kaggle the dataset contains two files train.csv and test.csv.The data files train.csv and test.csv contain gray-scale images of hand-drawn digits, from zero through nine.
  • Each image is 28 pixels in height and 28 pixels in width, for a total of 784 pixels in total. Each pixel has a single pixel-value associated with it, indicating the lightness or darkness of that pixel, with higher numbers meaning darker. This pixel-value is an integer between 0 and 255, inclusive.
  • The training data set, (train.csv), has 785 columns. The first column, called “label”, is the digit that was drawn by the user. The rest of the columns contain the pixel-values of the associated image.

Setting up Google Colab

Google Colab is a GPU enabled online cloud platform for training deep learning models. The free plan of Google Colab allows you to train the deep learning model for up to 12 hrs before the runtime disconnects. You have to select runtime as GPU before launching the Jupyter notebook as shown below –
Google Colab GPU Runtime
Google Colab GPU Runtime

We have uploaded the dataset on our google drive and we need to mount the google drive directory on our runtime Jupyter environment as shown below. This command will generate a URL on which you need to click, authenticate our Google drive account and copy the authorization key over here and press enter.

In [1]:
from google.colab import drive
drive.mount('/content/gdrive')
Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/gdrive

Now that we have set up the Google Colab let us start with the actual building of image classification model with Keras.

Image Classification using Deep Neural Network with Keras

Importing required libraries

In [2]:
import cv2
import numpy as np
import os
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.model_selection import train_test_split
Using TensorFlow backend.

Read the CSV files using Pandas

Here we assign the paths of training file and testing file into two variables and then read both the train.csv and train.csv files into pandas data frames.
In [3]:
train_path="/content/gdrive/My Drive/train.csv"
test_path="/content/gdrive/My Drive/test.csv"
In [4]:
train = pd.read_csv(train_path)
If we check the shape of train dataframe we can see that it has 42000 rows and in each row, there are 785 columns. 784 out of these columns corresponds to pixel value of the image and the remaining one column is the label of the image. This can be seen when we fetch some of the rows using head() function.
In [5]:
print(train.shape)
train.head()
(42000, 785)
Out[5]:
label pixel0 pixel1 pixel2 pixel3 pixel4 pixel5 pixel6 pixel7 pixel8 pixel9 pixel10 pixel11 pixel12 pixel13 pixel14 pixel15 pixel16 pixel17 pixel18 pixel19 pixel20 pixel21 pixel22 pixel23 pixel24 pixel25 pixel26 pixel27 pixel28 pixel29 pixel30 pixel31 pixel32 pixel33 pixel34 pixel35 pixel36 pixel37 pixel38 pixel744 pixel745 pixel746 pixel747 pixel748 pixel749 pixel750 pixel751 pixel752 pixel753 pixel754 pixel755 pixel756 pixel757 pixel758 pixel759 pixel760 pixel761 pixel762 pixel763 pixel764 pixel765 pixel766 pixel767 pixel768 pixel769 pixel770 pixel771 pixel772 pixel773 pixel774 pixel775 pixel776 pixel777 pixel778 pixel779 pixel780 pixel781 pixel782 pixel783
0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

5 rows × 785 columns

[adrotate banner=”3″]

Similarly, here in test dataframe there are 28000 rows and 784 columns. We don’t have a label column present here since it has to be predicted by our image classification model.

In [6]:

test= pd.read_csv(test_path)
print(test.shape)
test.head()
(28000, 784)
Out[6]:
pixel0 pixel1 pixel2 pixel3 pixel4 pixel5 pixel6 pixel7 pixel8 pixel9 pixel10 pixel11 pixel12 pixel13 pixel14 pixel15 pixel16 pixel17 pixel18 pixel19 pixel20 pixel21 pixel22 pixel23 pixel24 pixel25 pixel26 pixel27 pixel28 pixel29 pixel30 pixel31 pixel32 pixel33 pixel34 pixel35 pixel36 pixel37 pixel38 pixel39 pixel744 pixel745 pixel746 pixel747 pixel748 pixel749 pixel750 pixel751 pixel752 pixel753 pixel754 pixel755 pixel756 pixel757 pixel758 pixel759 pixel760 pixel761 pixel762 pixel763 pixel764 pixel765 pixel766 pixel767 pixel768 pixel769 pixel770 pixel771 pixel772 pixel773 pixel774 pixel775 pixel776 pixel777 pixel778 pixel779 pixel780 pixel781 pixel782 pixel783
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

5 rows × 784 columns

Reading Image from MNIST Dataset

As you might have understood by now that this dataset has just got pixel values of handwritten digits and not real images. Although the training and testing of the image classification model can be done on raw pixels, we can construct the image to see how it looks like.

Below we have plotted the pixel values of the first training data to construct the image which is actually numerical 1.

In [7]:
image = X_train[0].reshape( 28, 28)
plt.imshow(image)
Out[7]:
<matplotlib.image.AxesImage at 0x7f6b83537eb8>

Data Preprocessing

First of all, we will convert the pandas dataframes into numpy arrays.
In [8]:
X_train = (train.iloc[:,1:].values).astype('float32') 
y_train = train.iloc[:,0].values.astype('int32') 
X_test = test.values.astype('float32')
In [9]:
X_train
Out[9]:
array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]], dtype=float32)
In [10]:
X_train.shape
Out[10]:
(42000, 784)
In [11]:
y_train
Out[11]:
array([1, 0, 1, ..., 7, 6, 9], dtype=int32)

As you can see above that y_train has integers and we will convert it into one-hot encoding. In one-hot encoding, the integers are represented as vectors as follows –

1 – [1,0,0,0,0,0,0,0,0,0]

2 – [0,1,0,0,0,0,0,0,0,0]

3 – [0,0,1,0,0,0,0,0,0,0]

4 – [0,0,0,1,0,0,0,0,0,0] .. and so on.

We will use Keras built-in function to_categorical() to perform one-hot encoding.

In [12]:
from keras.utils.np_utils import to_categorical
In [13]:
y_train= to_categorical(y_train)

Now the final step in our preprocessing is data normalization where we transform the grayscale values of 0-255 to the range of 0-1 only. This will help to create a good model that converges fast.

In [14]:
X_train=X_train/255.0
X_test=X_test/255.0

Split Training set into Train and Validation set

As a good practice, it is better to split training set into train and validation set. The validation set is used during the training of neural network.

In [15]:
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.10, random_state=42)

Deep Neural Network Model Architecture

 

Image Classification with Deep Neural Network- MNIST Handwritten Digits
Image Classification with Deep Neural Network- MNIST Handwritten Digits
  1. Here the input layer has 784 neurons corresponding to each pixel of the image.
  2. There are two hidden layers with 100 and 50 neurons and they have ReLU activation functions.
  3. The output layer has 10 neurons corresponding to 1-10 numerical digits and has softmax activation function.
  4. There are no rules of how many hidden layers and neurons a neural network should have. This comes from trial and experimentations. The architecture that we have chosen has given us good results for image classification with MNIST handwritten digits dataset.

Implementation of Deep Neural Network with Keras

To create the neural network model we have to import the following modules from Keras libraries.

In [16]:
from keras.preprocessing.image import ImageDataGenerator
from keras import Sequential
from keras.layers import Dense,Flatten,Conv2D,MaxPooling2D,Activation,Dropout
from keras import backend as K
from keras.callbacks import EarlyStopping,ModelCheckpoint
from keras.optimizers import RMSprop
The below code is used to select GPUs –
In [16]:
K.tensorflow_backend._get_available_gpus()
[‘/job:localhost/replica:0/task:0/device:GPU:0’]
Let us now implement the neural network that we discussed above by using Keras.
In [17]:
classifier = Sequential()
classifier.add(Dense(100, activation='relu', kernel_initializer='random_normal',input_shape=(784,),name='first_layer'))#Second  Hidden Layer
classifier.add(Dense(50, activation='relu', kernel_initializer='random_normal',name='second_layer'))#Output Layer
classifier.add(Dense(10, activation='softmax'))

Before making network ready for training we have to compile the network by adding below things:

  1. Loss function: It measures by how much the prediction of neural network is deviating from the actual output. We use loss function as categorical cross-entropy.
  2. Optimizer: It updates the network parameters to reduce the error from loss function in every training iteration.  We use RMSprop optimizer with learning rate as 0.001
  3. Metrics: It monitors the performance of the network and we choose Accuracy as the metric.
In [18]:
classifier.compile(optimizer=RMSprop(lr=0.001),loss='categorical_crossentropy',metrics=['accuracy'])

Early Stopping

A major challenge in training neural networks is how long to train them.
Too little training will mean that the model will underfit the train and the test sets. Too much training will mean that the model will overfit the training dataset and have poor performance on the test set.

So to deal with this problem we use early stopping here by saving the best weights of the model during the training phase.

In [19]:
mc = ModelCheckpoint('best_model.h5', monitor='val_accuracy', mode='max', save_best_only=True)
es=EarlyStopping(monitor='val_accuracy', mode='max', verbose=1, patience=20)

Training of Model

For training, we are using the fit method of Keras. Here we are passing X_train,y_train, and validation (X_val,y_val) and for early stopping callbacks argument is passed. We will run it fro 50 epochs.

In [20]:
classifier.fit(X_train,y_train,validation_data=(X_val,y_val),epochs=50,shuffle=True,callbacks=[mc,es])
Train on 37800 samples, validate on 4200 samples
Epoch 1/50
37800/37800 [==============================] - 3s 66us/step - loss: 0.1223 - accuracy: 0.9893 - val_loss: 0.1295 - val_accuracy: 0.9886
Epoch 2/50
37800/37800 [==============================] - 2s 66us/step - loss: 0.0742 - accuracy: 0.9911 - val_loss: 0.0949 - val_accuracy: 0.9910
Epoch 3/50
37800/37800 [==============================] - 3s 67us/step - loss: 0.0502 - accuracy: 0.9930 - val_loss: 0.1186 - val_accuracy: 0.9888
Epoch 4/50
37800/37800 [==============================] - 3s 66us/step - loss: 0.0345 - accuracy: 0.9946 - val_loss: 0.0984 - val_accuracy: 0.9912
Epoch 5/50
37800/37800 [==============================] - 3s 66us/step - loss: 0.0257 - accuracy: 0.9954 - val_loss: 0.1109 - val_accuracy: 0.9895
Epoch 6/50
37800/37800 [==============================] - 2s 66us/step - loss: 0.0232 - accuracy: 0.9956 - val_loss: 0.1012 - val_accuracy: 0.9898
Epoch 7/50
37800/37800 [==============================] - 3s 67us/step - loss: 0.0161 - accuracy: 0.9966 - val_loss: 0.1138 - val_accuracy: 0.9867
Epoch 8/50
37800/37800 [==============================] - 2s 66us/step - loss: 0.0132 - accuracy: 0.9970 - val_loss: 0.0955 - val_accuracy: 0.9902
Epoch 9/50
37800/37800 [==============================] - 2s 66us/step - loss: 0.0090 - accuracy: 0.9979 - val_loss: 0.1090 - val_accuracy: 0.9912
Epoch 10/50
37800/37800 [==============================] - 3s 67us/step - loss: 0.0090 - accuracy: 0.9976 - val_loss: 0.1165 - val_accuracy: 0.9881
Epoch 11/50
37800/37800 [==============================] - 3s 67us/step - loss: 0.0066 - accuracy: 0.9984 - val_loss: 0.1281 - val_accuracy: 0.9867
Epoch 12/50
37800/37800 [==============================] - 3s 67us/step - loss: 0.0047 - accuracy: 0.9988 - val_loss: 0.1071 - val_accuracy: 0.9895
Epoch 13/50
37800/37800 [==============================] - 3s 67us/step - loss: 0.0066 - accuracy: 0.9985 - val_loss: 0.1270 - val_accuracy: 0.9890
Epoch 14/50
37800/37800 [==============================] - 3s 66us/step - loss: 0.0043 - accuracy: 0.9988 - val_loss: 0.1373 - val_accuracy: 0.9867
Epoch 15/50
37800/37800 [==============================] - 3s 67us/step - loss: 0.0058 - accuracy: 0.9987 - val_loss: 0.1394 - val_accuracy: 0.9867
Epoch 16/50
37800/37800 [==============================] - 2s 66us/step - loss: 0.0037 - accuracy: 0.9990 - val_loss: 0.1341 - val_accuracy: 0.9883
Epoch 17/50
37800/37800 [==============================] - 3s 66us/step - loss: 0.0033 - accuracy: 0.9990 - val_loss: 0.1490 - val_accuracy: 0.9874
Epoch 18/50
37800/37800 [==============================] - 2s 66us/step - loss: 0.0034 - accuracy: 0.9992 - val_loss: 0.1362 - val_accuracy: 0.9895
Epoch 19/50
37800/37800 [==============================] - 2s 66us/step - loss: 0.0027 - accuracy: 0.9993 - val_loss: 0.1281 - val_accuracy: 0.9900
Epoch 20/50
37800/37800 [==============================] - 3s 67us/step - loss: 0.0023 - accuracy: 0.9993 - val_loss: 0.1461 - val_accuracy: 0.9871
Epoch 21/50
37800/37800 [==============================] - 3s 67us/step - loss: 0.0018 - accuracy: 0.9994 - val_loss: 0.1347 - val_accuracy: 0.9890
Epoch 22/50
37800/37800 [==============================] - 3s 67us/step - loss: 0.0028 - accuracy: 0.9992 - val_loss: 0.1455 - val_accuracy: 0.9888
Epoch 23/50
37800/37800 [==============================] - 3s 67us/step - loss: 0.0026 - accuracy: 0.9994 - val_loss: 0.1570 - val_accuracy: 0.9867
Epoch 24/50
37800/37800 [==============================] - 3s 67us/step - loss: 0.0020 - accuracy: 0.9996 - val_loss: 0.1391 - val_accuracy: 0.9883
Epoch 00024: early stopping
Out[20]:
<keras.callbacks.callbacks.History at 0x7f6afeb6bd30>
Let us verify how well our trained model has performed on the validation set. We can see that it has achieved an accuracy of 98.8% on the validation set which is very impressive.
In [21]:
classifier.evaluate(X_val,y_val)
4200/4200 [==============================] - 0s 28us/step
Out[21]:
[0.13906813098443913, 0.9883333444595337]

Test the Model

We will now test the model on some random test data set. We will first construct the image of the number from the test data and then run our classifier to verify if it could identify it correctly.
We can see here that the model could correctly identify 2 and 0.
In [21]:
image = X_test[0].reshape( 28, 28)
plt.imshow(image)
Out[21]:
<matplotlib.image.AxesImage at 0x7f3041da1f98>
In [22]:
Y_pred = classifier.predict(X_test)
Y_pred_classes = np.argmax(Y_pred,axis = 1) 
In [23]:
print(Y_pred_classes[0])
2
In [24]:
image = X_test[1].reshape( 28, 28)
plt.imshow(image)
Out[24]:
<matplotlib.image.AxesImage at 0x7f3041aa7cf8>
In [25]:
print(Y_pred_classes[1])
0

Conclusion

We hope it was a good tutorial for you to understand about image classification with deep learning using Keras. We saw that a simple deep learning model can also show reasonably good results on the MNIST handwritten digits dataset.

 

Follow Us

Leave a Reply

Your email address will not be published. Required fields are marked *