Digit Recognition CNN and GUI APP

The handwritten digit recognition is the ability of computers to recognize human handwritten digits. It is a hard task for the machine because handwritten digits are not perfect and can be made with many different flavors. The handwritten digit recognition is the solution to this problem which uses the image of a digit and recognizes the digit present in the image.

By the end of this article you will have a digit recognition model that can predict on real world images and a GUI app.

Prerequisites

1. Python
2. Python Libraries : Tensorflow , Matplotlib

You can also use online notebook platforms like Google Colab and/ or kaggle.

About MNIST Dataset

This is probably one of the most popular datasets among machine learning and deep learning enthusiasts. The MNIST dataset contains 60,000 training images of handwritten digits from zero to nine and 10,000 images for testing. So, the MNIST dataset has 10 different classes. The handwritten digits images are represented as a 28×28 matrix where each cell contains grayscale pixel value.

Steps for building the CNN

You can find code in the full notebook here on github.

1. Import these libraries

If any module is not present then install it and restart the kernel.

2. Preparing the DATASET

The Output After running the cell will be as follows:

We here are downloading the dataset and breaking it into train, validation, and test images with 55,000 , 5,000 and 10,000 images respectively and reshaping them to (28,28,1) and normalizing the pixel values between 0 and 1 by dividing it by 255. Next we change the labels into categorical i.e, one hot encoding it.

3. Creating Model

Here we are building a Sequential Model with Conv2d layers and MaxPooling. If you are unfamiliar with these terms, I would recommend to go through TensorFlow documentation.

The output activation function softmax is used. It return the value who has the largest value. For example if an array is [0.002,0.32,0.3442,0.76] so after softmax layer we will have [0,0,0,1]. You get the idea, for more you can always Google :)

Next steps include compiling the model.

We use adam optimizer with categorical_crossentropy as a loss function.

Finally your model.summary() should look like this :

4. Adding a Model Checkpoint

Adding ModelCheckpoint to save the model with best accuracy on validation dataset.

This Saves the model weights when the loss is decreased on Validation dataset.

5. Training

Finally we train our model on the training dataset.

Next after you run the above cell and the training completes, we load the model weights which had best accuracy on our validation dataset.

If you want to visualize your model accuracy and loss over time please refer to the code in github. I am getting straight to predicting part.

6. Predicition

Now the moment comes which we all were waiting for...

First we will predict images in test dataset then we will go for real world image prediction

Here we predict on all 10,000 images and store them in a variable 'y_hat'. Then we are randomly plotting 15 images and setting the title as "prediction(actual)" and if both are same then color is green else its red.

In the image here all are correct so every title is green.

Re-Run the cell again for another next 15 random samples.

For overall accuracy on test data run this code in the cell

Our model has an accuracy of 99.02% on test dataset.

7. Predicting on Real World Images

Before we go on creating a GUI app, Lets predict on real world images

We are using cv2 library here.

First let me show you the code and later I will explain it.

Here I was using Google colab so I used google.colab library to upload files. If you are using another platform, search the code for uploading files on it from google.( Use you Googling Skills 😉 ) . In this code we are using a loop within uploaded.keys() as we can upload more than one images, so change your code accordingly for any other platfom. I would suggest to make a function a call it.

Now what's happening in the code here is that first we covert image to grayscale then we are fetching the images and resizing it to (18,18). Next we add 5 blank pixels all around i.e padding as our dataset as images like this. See above photos. So after adding this 5 pixels all around our image is now of (5+18+5,5+18+5) i.e, (28,28) which is required.

Next we reshape each image to (28,28,1) and send it to model and we show the image by plotting.

Save the model with code : model.save('mymodel.h5')

This file will be used in next step.

8. GUI Application

For the next segment you can find the complete code here.

Just change the path to where you have stored the above mymodel.h5 file in line 19.

Also run this code on a PC as online notebooks will not open another window for GUI.

Here we are building the layout with tkinter module.

After running the code the layout looks like this:

Here you simply draw multiple numbers on the canvas with mouse and click on recognise button for your output.

A example is shown below:

The full code is available on this github repository : handwritten-digit-recogniton

Feel free to colab for adding any new features.

If you are facing any troubles feel free to contact me

Machine Learning Projects

Thursday, 28 January 2021

Digit Recognition CNN Model And GUI Application