CONVOLUTIONAL NEURAL NETWORKs FOR MNIST DATASET CLASSIFICATION

(1)

1772 | P a g e

CONVOLUTIONAL NEURAL NETWORKs FOR MNIST DATASET CLASSIFICATION

B. Gnana Priya Assistant Professor

Department of Computer Science and Engineering Annamalai University

ABSTRACT

In this paper, classification of MNIST dataset for handwritten digit recognition is carried out using convolutional neural networks. CNNs are more popular nowadays due to their ability to learn features automatically from the images fed to them. The proposed work classifies the digits with an accuracy of 92%. CNNs prove to be the best network model for image classification problems when the data availability is more.

KEYWORDS:, Deep learning, Convolutional neural network, MNIST dataset.

1. INTRODUCTION

Convolutional Neural Networks ( CNNs) are a category of Neural Networks that proves to be very effective in areas such as image classification and image recognition. CNN has been successful in identifying objects, human poses, faces, digits, traffic signs, etc. and are applied in various computer vision tasks. CNN is designed to automatically and adaptively learn spatial hierarchies of features through backpropagation by using multiple building blocks, such as convolution layers, pooling layers, and fully connected layers.

1.1 Convolutional Neural Network

CNN is a hierarchical learning which allows us to automatically learn features from training process instead of hand designed feature extraction process.

Hand designing a set of rules and algorithms to extract features from an image were earlier employed and is a tough and time consuming process. The pixel intensity values of an image serves as input to CNN. A series of hidden layers are used to extract features of the given image. The lower level layers to higher levels are used to extract simple to complex or more abstract features. In lower level layers simple features like edges are detected. The intermediate layers combine the simple features found previously and finds the corners and outline of the objects. Higher level layers (layers at the end) combine the edges, corners and outlines to form abstract objects.

Three types of layers are used here viz Convolutional Layer, Pooling Layer and Fully-Connected Layer. Convolutional layer forms the basic building block and uses kernels to detect features all over the image. The Kernels carries out a convolution operation which is an element-wise product and sum between two matrices. Pooling layers are inserted

(2)

1773 | P a g e between convolutional layers to reduce the

parameters and computation in the network thereby speeding up the training process on large data. It also resizes the input and prevent overfitting of network. Weight sharing method is used to speed up the training process on new set of data and improves performance of CNN.

2. RELATED WORKS

In deep learning, Convolutional Neural Networking (CNN) [1, 2] is being used for visual imagery analyzing. It is proven that Deep Learning algorithm like multilayer CNN using Keras with Theano and Tensorflow gives the highest accuracy in comparison with the most widely used machine learning algorithms like SVM, KNN & RFC. In 1998, the framework of CNNs is designed by LeCun et al. [3]

which had seven layers of convolutional neural networks. The MNIST handwritten digits [4] database is used for the experiment. Out of 70,000 scanned images of handwritten digits from the MNIST database, 60,000 scanned images of digits are used for training the network and 10,000 scanned images of digits are used to test the network. The images that are used for training and testing the network all are the grayscale image with a size of 28×28 pixels. Character x is used to represent a training input where x is a 784- dimensional vector as the input of x is regarded as 28×28 pixels. The equivalent desired output is expressed by y(x), where y is a 10-dimensional vector. In another research, they have shown that deep nets perform better when they are trained by simple back-propagation. Their architecture results in the lowest error rate

on MNIST compare to NORB and CIFAR10 [5]. In this regard, we can verify that the early machine learning approaches used by LeCun et al. [6] included linear classifiers where the error rate ranges from 7.6% to 12%, K-nearest neighbors approaches yields error rate ranging from 1.1 to 5%, non-linear classifiers-3.5%, support vector machines 0.8 to 1.4%, neural networks 1.6 to 4.7% and convolutional neural networks 0.7 to 1.7%.

With the use of data augmentation better results can be achieved and, the best error rate achieved using a convolutional neural network with no distortions and no preprocessing in LeCun et al. [7]. Better results were attained by Meier et al. [8]

using a committee of 25 neural networks and Ciresan et al. [9] using a 6-layers neural network.

3. PROPOSED WORK

The MNIST dataset was developed by Yann LeCun, Corinna Cortes and Christopher Burges for evaluating machine learning models on the handwritten digit classification problem. The dataset was constructed from a number of scanned document dataset available from the National Institute of Standards and Technology (NIST). Images of digits were taken from a variety of scanned documents, normalized in size and centered. Each image is of 28 x 28 pixel and has 10 classes of data- 10 digits(0-9).

A standard split of the dataset is used to evaluate and compare models, where 60,000 images are used to train a model and a separate set of 10,000 images are

(3)

1774 | P a g e used to test it. Fig(1) shows the sample images from the dataset.

Fig(1) : Sample Images from MNIST Dataset

Fig(2) : Network Architecture of the Proposed system

(4)

1775 | P a g e The various steps carried out are

 Capturing the Data

 Pre- processing the Dataset.

 Modelling the Convolutional neural network using Keras.

 Training the CNN for multiclass classification.

 Evaluating the model and predicting the output class of a test image.

 Finding the accuracy and loss.

 Plotting the Confusion Matrix.

4. NETWORK ARCHITECTURE We use Keras API written in python. Keras specially designed for neural network running on top of TensorFlow. It allows us to built networks easily, extend them and add new modules in a simple manner. We use the Sequential model for building our network. The desired layer can be added one by one in the Sequential model. The Dense layer (fully connected layer) is used to build a feed forward network in which all the neurons from one layer are connected to the neurons in the previous layer.

Fig(2) gives the proposed architecture of the network. It is a five layer architecture with three convolutional and two fully connected layers. We use a standard filter size of 3x3 in all layers. The number of filter in each convolutinal layer are 32,64 and 128 respectively. There are 200 units in each fully connected layer.ReLU activation function is required to give non-linearity to the model. This will make the network to learn non linear decision boundaries. As our problem is a

multiclass classification problem we use the SoftMax layer as the final layer.

In order to configure the network we use SGD (Stochastic Gradient Descent) optimizer. The loss type used here is categorical cross entropy which is used for multiclass classification. The accuracy and loss are the metrics we are tracking during the training process. The Confusion matrix for the various action categories are plot.

We can infer the percentage of correct recognition of a particular digit compared with other digits through this matrix.

5. CONCLUSION

Today Convolutional neural networks are the driving force behind machine learning and computer vision applications like robotics, medical diagnostics, self driving cars, etc.. This is because of their ability to work with few parameters and their simplicity. Deep learning algorithm for MNIST digit classification using Keras library running in top of Tensorflow is employed. The proposed work classifies the digits with an accuracy of 92% .

6. REFERENCES

[1] Y. LeCun et al., "Backpropagation applied to handwritten zip code recognition," Neural computation, vol. 1, no. 4, pp. 541- 551, 1989.

[2] A. Krizhevsky, I. Sutskever, and G. E.

Hinton, "Imagenet classification with deep convolutional neural networks," in Advances in neural information processing systems, 2012, pp. 1097-1105.

(5)

1776 | P a g e [3] Y. LeCun et al., "Handwritten digit

recognition with a backpropagation network," in Advances in neural information processing systems, 1990, pp.

396-404.

[4] Y. LeCun, "The MNIST database of handwritten digits," http://yann. lecun.

com/exdb/mnist/, 1998.

[5] D. C. Ciresan, U. Meier, J. Masci, L.

M. Gambardella, and J. Schmidhuber,

"Flexible, high performance convolutional neural networks for image classification,"

in Twenty-Second International Joint Conference on Artificial Intelligence, 2011.

[6] LeCun, Bottou, Bengio and Haffner,

“Gradient-based learning applied to document recognition”. Proc. IEEE 1998.

[7] Meier, Ciresan, Gambardella and Schmidhuber, “Better Digit Recognition with a Committee of Simple Neural Nets”, In Proceedings of the 2011 International Conference on Document Analysis and Recognition,September 2011.

[8] Ciresan, Meier, Gambardella and Schmidhuber, “ Deep, big, simple neural nets for handwritten digit recognition”, Neural Comput. 2010.

[9] B. Zhang and S. N. Srihari, "Fast k- nearest neighbor classification using cluster-based trees," IEEE Transactions on Pattern analysis and machine intelligence, vol. 26, no. 4, 2004.