Face Recogniton from Real Time videos using Autoencoders

In submitting my thesis work on 'Face Recognition from Real Time Video Using Stacked Autoencoders', I would like to take this opportunity to express my deep sense of graduate and sincere gratitude to all those who directly or indirectly helped during my thesis work. His advice and encouragement is the success behind my successful completion of my thesis work. This thesis presents the overview of the problem of facial recognition from real-time surveillance video.

A survey of facial detection algorithms, as well as the implementation and testing of methods for extracting facial features and reducing dimensions and preparing native data for proper feature extraction. Human face detection and recognition play an important role in many applications, such as video surveillance, biometrics, law enforcement and system security. My goal, which I think I have achieved, was to develop a Deep Learning method, Stacked Autoencoders, for unsupervised facial recognition, robust and accurate.

My database consists of Youtube database, Webcam real time data, IITH CCTV data and Chokepoint database. Recognition plays an important role in finding the identity and emotions in the human face.

Applications and Challanges of Face Recognition

Description of video Quality

Illumination
Pose Invariation
Low resolution
Obstruction
Noise and Blurness

We humans can recognize a large number of faces, which are learned throughout our lives, and we can identify the faces even after many years. Since the lighting is usually not uniform in the area of surveillance cameras, the lighting on the human face can vary significantly. This can change the intensity of the image, even causing different parts of the image to be illuminated differently, which can cause problems for the recognition system.

In surveillance videos, the subject will move freely in the camera's field of view, and the captured face will be in different poses in different cameras, making it difficult to face transformation. Surveillance cameras normally capture images at low resolution, the pixels in facial images are very limited. If the camera is far from the subject, the subject's face will be small, causing the number of pixels on the subject's face to be low, making it difficult for robust facial recognition.

Captured images are often corrupted by noise and subject motion usually introduces blurring to cause problems for recognition systems.

Figure 1.1: face Recognition at different lighting conditions

Methods and Database for Face Recognition

Face Recogiton using Deep Learning

Haar Features
Integral image
Adaboost
Cascading

Viola Jones Face Detection Algorithms is a powerful tool that has the ability to process the images extremely with fast detection speed. Adaboost is a learning algorithm used to select the features by removing redundant features, which produces a highly efficient classification and ultimately combines the classifiers in a cascade to quickly remove the background areas of the image. In the object detection procedure, the classification of images is performed based on the value of simple features, but not based on the pixels directly.

In the face detection algorithm, we use five types of features. The first type is that the value calculated between two horizontal rectangles is the difference between the sum of the pixels in the white rectangle area and the black rectangle area. An object with three rectangles is calculated as the difference between the sum within two outer white rectangles and the central black area. Finally, a four-rectangle feature is the difference between the sum of diagonal pairs of white rectangles and black rectangles.

In two regions, the difference between the sum of pixels within Region A and Region B gives the Haar function, which is similar to the eye region. In three regions, the difference between the sum of the outer regions region C, region E, and center region D gives a Haar function similar to the nose. The integral image at a particular location can be calculated as the sum of the pixels above and to the left of the pixels.

Adaboost is one of viola jones method used to build a classifier by selecting important features. In the figure, Adaboost selects a 24x24 window, and the total Haar features extracted with five types of rectangle will be 63000+ features, which will take so much time to classify, so it needs to remove the learning process to make adaboost a fast classifier. the vast majority of redundant functions and should focus on a small set of important functions. Feature selection is done with a simple modification of AdaBoost, at each stage of the augmentation process, Adaboost selects a new weak classifier.

Cascading is the fourth method in Viola Jones algorithm which is used to speed up the algorithm by selecting sets of classifiers and dividing the feature into each classifier and omitting the negative examples. Cascade of classifiers is constructed to improve the detection performance and reduce the computation time, which is more efficient, and boosted classifiers are constructed to reject many of the negative subwindows. Each classifier is used to eliminate a large number of negative examples with very little processing, and then the second.

Summary

Feature Selection

Usually the KLT tracker uses Shi and Tomasi's algorithm for detecting good features to track automatically, but each point can be selected according to any criteria. Good features to track are corners, pixels in regions of irregular intensity, and points at non-straight edges. Differentiating between good-to-track and bad-to-track features is done by comparing the two eigenvalues calculated with a threshold.

During the feature selection process, the algorithm sorts the secondary eigenvalues in descending order to choose feature coordinates at the top of the sorted list.

Feature Extraction

It is impossible to find where the pixel went in the frame based on local information or single pixel in the window. Because of these issues, we don't track individual pixels, but we take windows of pixels and look for good features.

Tracking

Summary

Intoduction to Deep Learning

Deep Architecture

Deep Learning aims to automatically discover these abstractions from the lowest to the highest level, and learning algorithms are unsupervised. Deep Learning allows the network to discover features on its own instead of requiring a predefined set of all possible abstractions. Deep Learning has the power to automatically learn important underlying features, which accounts for the popularity of deep architectures as the broad applications of deep machine learning.

Feature reuse is of the advantage of deep learning, which explains the power of distributed representations, after training the number of examples and learning the features, deep learning allows it to add new examples to the existing data and train for feature extraction . The main idea is greedy layerwise pretraining which is to learn a hierarchy of features one level at a time and the features extracted at each level are used as the input to the next hidden layer to create a new transformation at each level to learn Finally, we add the Classification layer to train the network in Supervised Learning way and we combine all the layer and do forward propagation for better feature learning.

Importance of Deep learning

Multilayer Neural Network

The above equation shows the feedforward neural network that calculates the output at each layer. To train this neural network, first take the input samples as (x(i), y(i)), where x(i) is the input vector and gy(i) is the output vector. In the process of backpropagation, the weights are adjusted depending on the existence of an expected output.

Since the cost function is a non-convex function, the gradient descent is close to the local optima and the gradient descent works quite well.

Figure 4.2: Multilayer Neural Network z i (2) = ∑ n

Summary

In my thesis I prepared a database for testing in videos and my database is similar to real world examples. I tested my Stacked Autoencoders deep learning algorithm with YouTube videos, real-time webcam, Chockepoint and IITH CCTV database.

Normalize

PCA and ZCA Whitening

An autoencoder is an unsupervised feature learning algorithm that uses back propagation by setting the output values to equal the inputs. Autoencoders provide the ab method to automatically learn features from unlabeled data, allowing unsupervised learning, and tries to discover generic features of the data that learns the approximation of the identity function by learning important sub-features. This was a major downfall of previous deep neural networks as a large amount of time was required for the networks to learn, making the technique impractical for practical learning.

Preprocessing

Feature Extraction using Stacked Autoencoders

Logistic regression is a simple classification algorithm for learning to decide whether a grid of pixel intensities represents class A or class B. In logistic regression, we will try to predict the probability that a given case belongs to class A, or the probability that it belongs to class A. to class B.

Databses

This dataset consists of three camera views with different lighting conditions and one person at a time staggering and multiple persons walking in front of CCTV.

Summary

Face recognition of real-time video using Stacked Autoencoders was successful for all the databases I prepared and chokepoint database extracted from the Internet. The table is made for two databases one for Real-time webcam face recognition and the second table shows the.

Figure 9.1: Result1 Recognizing Modi from Real Time video

Summary

Future work