A REVIEW ON MOTION GESTURE RECOGNITION FOR HUMAN-MACHINE INTERACTION

(1)

International Journal of Recent Advances in Engineering & Technology (IJRAET)

ISSN (Online): 2347 - 2812, Volume-1, Issue - 2, 2013 124

A REVIEW ON MOTION GESTURE RECOGNITION FOR HUMAN-MACHINE INTERACTION

Jyothi Lakshmi P

Department of TE, VKIT, Bangalore, India.

[email protected]

K R Rekha Department of ECE SJBIT,

Bangalore, India.

[email protected]

K R Nataraj Department of ECE, SJBIT,

Bangalore, India [email protected]

Abstract- In this paper, In the field of computer vision and pattern analysis the biggest challenging problem is the vision-based hand gesture recognition. since it has some difficulties of algorithmic problems such as dynamic background, hand segmentation, camera calibration, speed and need of external data gloves etc. In order to overcome the disadvantages of Data Glove – based method, vision based method is implemented that reliably detects and track the position of the hands using computer vision techniques. The paper provides the detailed analysis and discussion in the gesture reorganization system for human machine interface.

Index Terms— gesture recognition system, human machine interface, Human Computer Interaction (HCI), DTW, HMM, SVM, Hand segmentation.

I.

INTRODUCTION

By using computer technology user-computer interaction has become very active, also made encouraging progress. This kind of technology has gradually shifted from a computer-centric user-computer interaction to human-centered multi-media, multi-mode interactive technology including lip reading, head motion tracking, hand gesture recognition, face recognition, facial expression recognition as well as body gesture recognition. Research of hand gesture recognition relates to computer graphics, robot kinematics, medicine and other disciplines, the applications of hand gesture recognition are very extensive. Hand gesture recognition research is mainly concern the analysis of dynamic image sequence which usually involve hand gesture separation, gesture analysis (gesture feature extraction, gesture modeling) and a few gesture recognition process [1]. Gesture segmentation refer to how to separate hand gestures from a continuous image sequence containing gesture.

The existing hand gesture recognition systems can be mainly divided into data-glove based systems and vision based ones. The data-glove based systems using extra

sensors to capture human hand motions may be easier to collect hand configurations and movements. However, these equipments usually are expensive and often bring much cumbersome experience to users. The vision based systems independent of external dedicated devices are becoming more and more feasible for HCI. Early research on vision based hand gesture recognition systems usually needs the help of colored gloves or markers to make the image processing easier. The current research is more focused on tracking the bare hand and recognizing the hand gestures using color, shape, or depth information. Whereas, those systems generally require a single person in the camera view, or a single large centered hand in the camera view uniform background, invariable illumination, [3]. Meanwhile, the vision based hand gesture recognition system needs to fulfill the requirements including real-time accuracy, robustness and performance.

Hand gesture spotting and recognition is an active area of research in the vision community for the purpose of sign language recognition. Previously, several methods of potential applications in the advanced hand gesture interfaces for Human Computer Interaction (HCI) have been introduced but these differ from one another in their models. Some of the models are Dynamic Time Warping (DTW), Hidden Markov Models (HMMs) and Neural Network [5].

This paper is organized as follows. Section 2 gives a Brief review of the motion gesture recognition. Section 3 presents the overview of motion gesture recognition for human machine interface which includes a segmentation, feature extraction and SVM.. A conclusion is given in Section 4.

II.

RELATED WORK

Hand gesture segmentation is the key and major premise to the analysis and identification. Quality of the gesture segmentation directly affects the rate of recognition.

(2)

Effective use of information such as color, motion, geometric information is the key of the study [1], [2].

Hand gestures in a complex background are very difficult to separate, no mature theory exists as a guide, and existing algorithms are:

(1) Method of increasing restrictions to simplify the separation of gestures and background areas such as using white or black walls, dark clothing to simplify the background, or to wear special gloves to emphasize foreground.

(2) large-capacity of hand gesture shape database. For example, Cui Yuntao from the computer science department of Michigan State University, who set up a database including all kinds of gestures in different positions and scale as templates for image recognition based on template matching .

(3) Stereo Viewing. For example, Gluckman from the department of computer science at Columbia University, who used two reflecting images which not in the same plane to calculate the distance between a camera and the object, then separate hand gesture according to the distance. Separating the hand signal region from image is the first and also the most essential step in the process of hand signal recognition. The accuracy of the hand signal segmentation affects the quality of the final recognition directly.

In the process of monocular video-based hand signal recognition, separating the hand signal area from background is the task of this paper. The main difficulties of the task lying in varied background and unforeseen environmental factors as well as changing of the shapes of hands. In [7] they introduce the usefulness of the background information in hand segmentation.

This method emphasizes the importance of the background information that can be the supplement to the traditional skin color segmentation. The background information is helpful, especially prominent in waken the interference of the non-uniform illumination to the results of segmentation, which can be extended to other object segmentation like face.

In paper [4] proposes a novel adaptive skin color model for hand segmentation. Other algorithms mentioned earlier in this paper model the YCbCr skin color through fitting ellipse and bounding box which do not accurately represent skin color due to the inclusion of non-skin pixels as skin pixels in the methods. The different ethnicities have different skin colors in appearance and they have shown that it is able to model the skin colors which are tailored to any individuals. So it is useful for any personalized systems or applications. An automatic hand gesture spotting method for numbers from 0 to 9 using CRFs. they performs the hand gesture spotting and

recognition tasks simultaneously. it is suitable for real- time applications and solves the issues of time delay between the segmentation and the recognition tasks.

They can successfully recognize isolated gestures and spotting meaningful gestures that are embedded in the input video stream reliability [5]. In future they can improve hand gesture spotting accuracy using short gesture detector and fingertip detection for gesture path in conjunction with multi-camera system. In addition, they expect an ongoing increase of work on extending the proposed non-gesture model with CRFs to Latent- Dynamic Conditional Random Fields.

The hand gesture recognition using shape based approach includes several steps: thumb detection, smudges elimination, finger counts, orientation detection, etc [6]. Visually Impaired people can make use of hand gestures for writing text on electronic document like MS Office, notepad etc. The aim of this approach includes ease of implementation and its simplicity, and it does not require any training or post processing, it gives us with the higher recognition rate with minimum computation time.

The binary state tactile patterns inside the prosthetic socket related to hand gestures were presented in [8].

The discrete patterns show high possibility to enable controlling the prosthesis directly with three degrees of freedom with using the simple classification algorithm.

The effect of the re-attachment of the socket showed degradation of the sensitivity. However the specificity was kept almost the same. This performance of the classification has to be increased, which could be achieved by training of the user and carefully re- attachment of the socket. The specificity was less variation than the sensitivity. It means that the user can suppress the movement of the prosthesis with higher possibility.

They use the simplest model (single Gaussian model) to characterize the skin color in normalized RGB color space and YCbCr color space respectively. Although the single Gaussian is unable to describe the correlation of color information in two or three channels, satisfying results are still obtained. In future work we mean to use this method with Gaussian mixture model or other free- skin color model to obtain better performance [7].

A hierarchical gesture recognition algorithm was proposed in [10] to recognize a large number of gestures using a hidden markov model, kalman filtering process and graph matching and achieved a recognition rate of 89.6%. A visual approach based gesture recognition system was proposed in [11]. The system used neural network, PCA as well as clustering and encoding techniques. After closely reviewing the reported works related to hand gestures detection and recognition, the

(3)

proposed system in this paper is designed by using image preprocessing and feature extraction techniques (GF together with PCA) that have not been used in the previous works.

Gabor Filter: GFs have been used during several years for extracting features from images. The Gabor wavelet representation of an image is the convolution of the image with a family of Gabor 2-D Gabor filter is a product of an elliptical Gaussian in any rotation and a complex exponential representing a sinusoidal plane wave .

For most applications, it is necessary first to transform the data into some representation form before training a neural network. In practice, it is always advantageous to apply preprocessing transformations as shown above to the input data before it is presented to a network and the choice of preprocessing will be one of the most significant factors in determining the performance of the final system. Another important way, in which network’s performance can be improved, sometimes dramatically, is through the reduction of the dimensionality of the input data. Dimensionality reductions involve forming linear or nonlinear combinations of the original variables to generate inputs for the network. Such combinations of inputs are sometimes called features, and process of generating the inputs for the network is called feature extraction.

Principal motivation for dimensionality reduction is that it can help to alleviate the worst effects of a large dimensionality [9].

Classification-based and Descriptor matching schemes such as [12], [13] represents the action recognition.

However, for large-scale action recognition problems, where the training database consists of action videos with thousands of labeling, such a matching scheme may require tremendous amounts of time for computing similarities or distances between actions. The ramification increases quadratically with respect to the dimension of action (frame) descriptors. Reducing the dimensionality of the descriptors can speed up the computation, but it tends to tradeoff with recognition accuracy. In this regard, an efficient action recognition system capable of rapidly retrieving actions from a large database of action videos is highly desirable.

III.

OVERVIEW OF MOTION GESTURE

RECOGNITION

By using the live video stream detect the hand. By using skin colour detection or motion detection method calculate the hand segmentation.

Webcam Hand

Segmentation

Detect fingers or color cap on

hand

Track the pointer Mark the path of

the pointer and make binary

pattern Filter the binary

map by morphological

operation

Feature extraction from

pattern.

Train/Classify pattern using

SVM

Perform relevant operation

Next important step is to detect the important point i.e.

marker on hand. We mainly focus on motion gesture recognition system where user has to draw the pattern in air. Hence the efficiency of the system lies in accurate marker detection. In this system two markers are required one marker is used to guide a pattern whereas other marker will be used to start and stop the motion gesture. There are two solutions for maker detection.

1. Finger tip detection and use it as marker along with thumb.

2. Wearing two different colour caps in two fingers.

Both methods will be tested and based on the accuracy we select the suitable method.

The coordinates of the marker in consecutive frames will be recorded and will be used to create a binary pattern in an image. After binary pattern, by using morphological operations, creation any noise will be filtered.

Another most important step of the project is to extract meaningful features from the pattern. more accuracy will get by using More relevant features results.

Different feature extraction methods like HMM will be tested and best method will be chosen.

The extracted features will be used to train the classifier at the initial stage. Support Vector Machine (SVM) will be used as a classifier, Once the classifier is trained with sufficient data the system will be ready to interact with user.

IV.

CONCULSION

Hand gesture recognition research is mainly concern the analysis of dynamic image sequence which usually

(4)

involves hand gesture separation, gesture analysis. By using the hand segmentation we can calculate hand detection. The methods of detections are compared and the accurate marker detection method is adopted. The co-ordinates of the marker is recorded and a binary pattern is created. The features of the images are extracted and this extracted information is recognized by SVM or ANN. This paper provides the detailed analysis and discussion in the gesture reorganization system for human machine interface.

REFERENCES

[1] Bao Hong Zhao Xinggui,” Study on Hand Gesture Segmentation”, University of Science and Technology BeijingBeijing, China, 2010.

[2] Yishen Xu, Jihua Gu, Zhi Tao, Di Wu, “Bare Hand Gesture Recognition with a Single Color Camera”, Soochow University Suzhou, China, 2009.

[3] H. Francke, J. Ruiz-del-Solar, and R. Verschae,

“Real-time hand gesture detection and recognition using boosted classifiers and active learning,” Pacific-Rim Symposium on Image and Video Technology, pp. 533–547,2007.

[4] Ahmad Yahya Dawod, Junaidi Abdullah,”

Adaptive Skin Color Model for Hand Segmentation”, Multimedia University, Cyberjaya, Malaysia,2010.

[5] Mahmoud Elmezain, Ayoub Al-Hamadi, Bernd Michaelis,” A Robust Method for Hand Gesture Segmentation and Recognition Using Forward Spotting Scheme in Conditional Random Fields”, Otto-von-Guericke- University, Magdeburg, Germany,2010.

[6] Meenakshi Panwar,” Hand Gesture Recognition based on Shape Parameters” Uttar Pradesh, India, 2011.

[7] Wai wang, jing pan,” Hand Segmentation using Skin color and Background information”, Tianjin, Xian, China. 15-17 July, 2012.

[8] Yuichiro Honda, Stefan Weber,” Stability analysis for tactile pattern based recognition system for hand gestures”,EMBS Cité Internationale, Lyon, France, 2007.

[9] Yonas Fantahun Admasu, Kumudha Raimond,”

Ethiopian Sign Language Recognition Using Artificial Neural Network”, Addis Ababa, Ethiopia, 2010.

[10] A. Shamaie and A. Sutherland, “Accurate Recognition of Large Number of Hand Gestures,” 2nd Iranian Conference on Machine Vision and Image Processing, K.N. Toosi University of Technology,Tehran, Iran, 13-15 February 2003.

[11] Y. Zhang and J. Yuan, “Gesture Recognition for Human-Computer Interaction Using Neural Networks,” 8th International Conference on Neural Information Processing, Shanghai, China, pp.735-740, November 14- 18, 2001, [12] A. Efros, A. Berg, G. Mori, and J. Malik,

“Recognizing action at a distance,” Proc. IEEE Int’l Conf. Computer Vision, vol. 2, pp. 726–

733, 2003.

[13] M. Blank, L. Gorelick, E. Shechtman, M. Irani, and R. Basri, “Actions as space-time shapes,”

Proc. IEEE Int’l Conf. Computer Vision, vol. 2, pp. 1395–1402, 2005



