Introduction
In this article, we will learn image classification with Keras using deep learning. We will not use the convolutional neural network but just a simple deep neural network which will still show very good accuracy. For this purpose, we will use the MNIST handwritten digits dataset which is often considered as the Hello World of deep learning tutorials. And since deep learning models are trained fast on GPUs, we will use Google Colab for building our model.
- Read More – Image Classification using Bag of Visual Words Model
- Read More – Keras Implementation of VGG16 Architecture from Scratch
Before we do the actual hands-on, let us first understand MNIST dataset
MNIST Handwritten Digit Dataset

- The MNIST handwritten digit classification problem is a standard dataset used in computer vision and deep learning tasks. It is a dataset of 60,000 small square 28×28 pixel grayscale images of handwritten single digits between 0 and 9.
- The task is to classify a given image of a handwritten digit into one of 10 classes representing integer values from 0 to 9.
- I have downloaded the dataset from Kaggle to show how you can use your own dataset to trained your model.
- In Kaggle the dataset contains two files train.csv and test.csv.The data files train.csv and test.csv contain gray-scale images of hand-drawn digits, from zero through nine.
- Each image is 28 pixels in height and 28 pixels in width, for a total of 784 pixels in total. Each pixel has a single pixel-value associated with it, indicating the lightness or darkness of that pixel, with higher numbers meaning darker. This pixel-value is an integer between 0 and 255, inclusive.
- The training data set, (train.csv), has 785 columns. The first column, called “label”, is the digit that was drawn by the user. The rest of the columns contain the pixel-values of the associated image.
Setting up Google Colab

We have uploaded the dataset on our google drive and we need to mount the google drive directory on our runtime Jupyter environment as shown below. This command will generate a URL on which you need to click, authenticate our Google drive account and copy the authorization key over here and press enter.
from google.colab import drive
drive.mount('/content/gdrive')
Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly Enter your authorization code: ·········· Mounted at /content/gdrive
Now that we have set up the Google Colab let us start with the actual building of image classification model with Keras.
Image Classification using Deep Neural Network with Keras
Importing required libraries
import cv2
import numpy as np
import os
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.model_selection import train_test_split
Using TensorFlow backend.
Read the CSV files using Pandas
train_path="/content/gdrive/My Drive/train.csv"
test_path="/content/gdrive/My Drive/test.csv"
train = pd.read_csv(train_path)
print(train.shape)
train.head()
(42000, 785)
label | pixel0 | pixel1 | pixel2 | pixel3 | pixel4 | pixel5 | pixel6 | pixel7 | pixel8 | pixel9 | pixel10 | pixel11 | pixel12 | pixel13 | pixel14 | pixel15 | pixel16 | pixel17 | pixel18 | pixel19 | pixel20 | pixel21 | pixel22 | pixel23 | pixel24 | pixel25 | pixel26 | pixel27 | pixel28 | pixel29 | pixel30 | pixel31 | pixel32 | pixel33 | pixel34 | pixel35 | pixel36 | pixel37 | pixel38 | … | pixel744 | pixel745 | pixel746 | pixel747 | pixel748 | pixel749 | pixel750 | pixel751 | pixel752 | pixel753 | pixel754 | pixel755 | pixel756 | pixel757 | pixel758 | pixel759 | pixel760 | pixel761 | pixel762 | pixel763 | pixel764 | pixel765 | pixel766 | pixel767 | pixel768 | pixel769 | pixel770 | pixel771 | pixel772 | pixel773 | pixel774 | pixel775 | pixel776 | pixel777 | pixel778 | pixel779 | pixel780 | pixel781 | pixel782 | pixel783 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | … | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | … | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | … | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
3 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | … | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | … | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
5 rows × 785 columns
[adrotate banner=”3″]
Similarly, here in test dataframe there are 28000 rows and 784 columns. We don’t have a label column present here since it has to be predicted by our image classification model.
In [6]:
test= pd.read_csv(test_path)
print(test.shape)
test.head()
(28000, 784)
pixel0 | pixel1 | pixel2 | pixel3 | pixel4 | pixel5 | pixel6 | pixel7 | pixel8 | pixel9 | pixel10 | pixel11 | pixel12 | pixel13 | pixel14 | pixel15 | pixel16 | pixel17 | pixel18 | pixel19 | pixel20 | pixel21 | pixel22 | pixel23 | pixel24 | pixel25 | pixel26 | pixel27 | pixel28 | pixel29 | pixel30 | pixel31 | pixel32 | pixel33 | pixel34 | pixel35 | pixel36 | pixel37 | pixel38 | pixel39 | … | pixel744 | pixel745 | pixel746 | pixel747 | pixel748 | pixel749 | pixel750 | pixel751 | pixel752 | pixel753 | pixel754 | pixel755 | pixel756 | pixel757 | pixel758 | pixel759 | pixel760 | pixel761 | pixel762 | pixel763 | pixel764 | pixel765 | pixel766 | pixel767 | pixel768 | pixel769 | pixel770 | pixel771 | pixel772 | pixel773 | pixel774 | pixel775 | pixel776 | pixel777 | pixel778 | pixel779 | pixel780 | pixel781 | pixel782 | pixel783 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | … | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | … | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | … | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | … | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | … | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
5 rows × 784 columns
Reading Image from MNIST Dataset
As you might have understood by now that this dataset has just got pixel values of handwritten digits and not real images. Although the training and testing of the image classification model can be done on raw pixels, we can construct the image to see how it looks like.
Below we have plotted the pixel values of the first training data to construct the image which is actually numerical 1.
image = X_train[0].reshape( 28, 28)
plt.imshow(image)
<matplotlib.image.AxesImage at 0x7f6b83537eb8>
Data Preprocessing
X_train = (train.iloc[:,1:].values).astype('float32')
y_train = train.iloc[:,0].values.astype('int32')
X_test = test.values.astype('float32')
X_train
array([[0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], ..., [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.]], dtype=float32)
X_train.shape
(42000, 784)
y_train
array([1, 0, 1, ..., 7, 6, 9], dtype=int32)
As you can see above that y_train has integers and we will convert it into one-hot encoding. In one-hot encoding, the integers are represented as vectors as follows –
1 – [1,0,0,0,0,0,0,0,0,0]
2 – [0,1,0,0,0,0,0,0,0,0]
3 – [0,0,1,0,0,0,0,0,0,0]
4 – [0,0,0,1,0,0,0,0,0,0] .. and so on.
We will use Keras built-in function to_categorical() to perform one-hot encoding.
from keras.utils.np_utils import to_categorical
y_train= to_categorical(y_train)
Now the final step in our preprocessing is data normalization where we transform the grayscale values of 0-255 to the range of 0-1 only. This will help to create a good model that converges fast.
X_train=X_train/255.0
X_test=X_test/255.0
Split Training set into Train and Validation set
As a good practice, it is better to split training set into train and validation set. The validation set is used during the training of neural network.
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.10, random_state=42)
Deep Neural Network Model Architecture

- Here the input layer has 784 neurons corresponding to each pixel of the image.
- There are two hidden layers with 100 and 50 neurons and they have ReLU activation functions.
- The output layer has 10 neurons corresponding to 1-10 numerical digits and has softmax activation function.
- There are no rules of how many hidden layers and neurons a neural network should have. This comes from trial and experimentations. The architecture that we have chosen has given us good results for image classification with MNIST handwritten digits dataset.
- Also Read – Animated Explanation of Feed Forward Neural Network Architecture
- Also Read – Animated guide to Activation Functions in Neural Network
Implementation of Deep Neural Network with Keras
To create the neural network model we have to import the following modules from Keras libraries.
from keras.preprocessing.image import ImageDataGenerator
from keras import Sequential
from keras.layers import Dense,Flatten,Conv2D,MaxPooling2D,Activation,Dropout
from keras import backend as K
from keras.callbacks import EarlyStopping,ModelCheckpoint
from keras.optimizers import RMSprop
K.tensorflow_backend._get_available_gpus()
classifier = Sequential()
classifier.add(Dense(100, activation='relu', kernel_initializer='random_normal',input_shape=(784,),name='first_layer'))#Second Hidden Layer
classifier.add(Dense(50, activation='relu', kernel_initializer='random_normal',name='second_layer'))#Output Layer
classifier.add(Dense(10, activation='softmax'))
Before making network ready for training we have to compile the network by adding below things:
- Loss function: It measures by how much the prediction of neural network is deviating from the actual output. We use loss function as categorical cross-entropy.
- Optimizer: It updates the network parameters to reduce the error from loss function in every training iteration. We use RMSprop optimizer with learning rate as 0.001
- Metrics: It monitors the performance of the network and we choose Accuracy as the metric.
classifier.compile(optimizer=RMSprop(lr=0.001),loss='categorical_crossentropy',metrics=['accuracy'])
Early Stopping
A major challenge in training neural networks is how long to train them.
Too little training will mean that the model will underfit the train and the test sets. Too much training will mean that the model will overfit the training dataset and have poor performance on the test set.
So to deal with this problem we use early stopping here by saving the best weights of the model during the training phase.
mc = ModelCheckpoint('best_model.h5', monitor='val_accuracy', mode='max', save_best_only=True)
es=EarlyStopping(monitor='val_accuracy', mode='max', verbose=1, patience=20)
Training of Model
For training, we are using the fit method of Keras. Here we are passing X_train,y_train, and validation (X_val,y_val) and for early stopping callbacks argument is passed. We will run it fro 50 epochs.
classifier.fit(X_train,y_train,validation_data=(X_val,y_val),epochs=50,shuffle=True,callbacks=[mc,es])
Train on 37800 samples, validate on 4200 samples Epoch 1/50 37800/37800 [==============================] - 3s 66us/step - loss: 0.1223 - accuracy: 0.9893 - val_loss: 0.1295 - val_accuracy: 0.9886 Epoch 2/50 37800/37800 [==============================] - 2s 66us/step - loss: 0.0742 - accuracy: 0.9911 - val_loss: 0.0949 - val_accuracy: 0.9910 Epoch 3/50 37800/37800 [==============================] - 3s 67us/step - loss: 0.0502 - accuracy: 0.9930 - val_loss: 0.1186 - val_accuracy: 0.9888 Epoch 4/50 37800/37800 [==============================] - 3s 66us/step - loss: 0.0345 - accuracy: 0.9946 - val_loss: 0.0984 - val_accuracy: 0.9912 Epoch 5/50 37800/37800 [==============================] - 3s 66us/step - loss: 0.0257 - accuracy: 0.9954 - val_loss: 0.1109 - val_accuracy: 0.9895 Epoch 6/50 37800/37800 [==============================] - 2s 66us/step - loss: 0.0232 - accuracy: 0.9956 - val_loss: 0.1012 - val_accuracy: 0.9898 Epoch 7/50 37800/37800 [==============================] - 3s 67us/step - loss: 0.0161 - accuracy: 0.9966 - val_loss: 0.1138 - val_accuracy: 0.9867 Epoch 8/50 37800/37800 [==============================] - 2s 66us/step - loss: 0.0132 - accuracy: 0.9970 - val_loss: 0.0955 - val_accuracy: 0.9902 Epoch 9/50 37800/37800 [==============================] - 2s 66us/step - loss: 0.0090 - accuracy: 0.9979 - val_loss: 0.1090 - val_accuracy: 0.9912 Epoch 10/50 37800/37800 [==============================] - 3s 67us/step - loss: 0.0090 - accuracy: 0.9976 - val_loss: 0.1165 - val_accuracy: 0.9881 Epoch 11/50 37800/37800 [==============================] - 3s 67us/step - loss: 0.0066 - accuracy: 0.9984 - val_loss: 0.1281 - val_accuracy: 0.9867 Epoch 12/50 37800/37800 [==============================] - 3s 67us/step - loss: 0.0047 - accuracy: 0.9988 - val_loss: 0.1071 - val_accuracy: 0.9895 Epoch 13/50 37800/37800 [==============================] - 3s 67us/step - loss: 0.0066 - accuracy: 0.9985 - val_loss: 0.1270 - val_accuracy: 0.9890 Epoch 14/50 37800/37800 [==============================] - 3s 66us/step - loss: 0.0043 - accuracy: 0.9988 - val_loss: 0.1373 - val_accuracy: 0.9867 Epoch 15/50 37800/37800 [==============================] - 3s 67us/step - loss: 0.0058 - accuracy: 0.9987 - val_loss: 0.1394 - val_accuracy: 0.9867 Epoch 16/50 37800/37800 [==============================] - 2s 66us/step - loss: 0.0037 - accuracy: 0.9990 - val_loss: 0.1341 - val_accuracy: 0.9883 Epoch 17/50 37800/37800 [==============================] - 3s 66us/step - loss: 0.0033 - accuracy: 0.9990 - val_loss: 0.1490 - val_accuracy: 0.9874 Epoch 18/50 37800/37800 [==============================] - 2s 66us/step - loss: 0.0034 - accuracy: 0.9992 - val_loss: 0.1362 - val_accuracy: 0.9895 Epoch 19/50 37800/37800 [==============================] - 2s 66us/step - loss: 0.0027 - accuracy: 0.9993 - val_loss: 0.1281 - val_accuracy: 0.9900 Epoch 20/50 37800/37800 [==============================] - 3s 67us/step - loss: 0.0023 - accuracy: 0.9993 - val_loss: 0.1461 - val_accuracy: 0.9871 Epoch 21/50 37800/37800 [==============================] - 3s 67us/step - loss: 0.0018 - accuracy: 0.9994 - val_loss: 0.1347 - val_accuracy: 0.9890 Epoch 22/50 37800/37800 [==============================] - 3s 67us/step - loss: 0.0028 - accuracy: 0.9992 - val_loss: 0.1455 - val_accuracy: 0.9888 Epoch 23/50 37800/37800 [==============================] - 3s 67us/step - loss: 0.0026 - accuracy: 0.9994 - val_loss: 0.1570 - val_accuracy: 0.9867 Epoch 24/50 37800/37800 [==============================] - 3s 67us/step - loss: 0.0020 - accuracy: 0.9996 - val_loss: 0.1391 - val_accuracy: 0.9883 Epoch 00024: early stopping
<keras.callbacks.callbacks.History at 0x7f6afeb6bd30>
classifier.evaluate(X_val,y_val)
4200/4200 [==============================] - 0s 28us/step
[0.13906813098443913, 0.9883333444595337]
Test the Model
image = X_test[0].reshape( 28, 28)
plt.imshow(image)
<matplotlib.image.AxesImage at 0x7f3041da1f98>
Y_pred = classifier.predict(X_test)
Y_pred_classes = np.argmax(Y_pred,axis = 1)
print(Y_pred_classes[0])
2
image = X_test[1].reshape( 28, 28)
plt.imshow(image)
<matplotlib.image.AxesImage at 0x7f3041aa7cf8>
print(Y_pred_classes[1])
0
- Read More – Image Classification using Bag of Visual Words Model
- Read More – Keras Implementation of VGG16 Architecture from Scratch
Conclusion
We hope it was a good tutorial for you to understand about image classification with deep learning using Keras. We saw that a simple deep learning model can also show reasonably good results on the MNIST handwritten digits dataset.