Vision based hand posture recognition: the information processing step

Information analysis Information acquisition

Hand localization Hand posture

modelling

Feature

extraction Recognition

Input gesture image

Information analysis

Information acquisition Decision making

Output: Matched gesture Figure 1.6: General block diagram representation of a hand posture recognition unit for CBA systems.

Testing refers to verifying the performance of the recognition unit in accurately classifying the test patterns based on the decision function derived while training. The correct classification (CC) accuracy of the hand posture recognition unit is defined as

CC= Total number of correctly classified test patterns

Total number of test patterns (1.1)

The information processing unit for recognizing the hand postures employs image processing algorithms to analyse the hand postures and derive the decision functions. The procedure for analysis includes hand localization and hand modelling. The decision function is derived through feature extraction and the decision label associated with a sample is obtained through classification. The general block diagram representation of the procedures involved in hand posture recognition is shown in Figure 1.6. The procedures are explained briefly as follows.

1.4.1 Hand localization

The primary aspect in developing a vision based CBA system is to ensure that the hand posture and its relative parameters are properly emphasised to aid information analysis. The common method employed to highlight the posture parameters in vision based interface is through the use of optical markers and the coloured gloves. Traditional methods use retro-reflective markers or light emitting diodes (LEDs) placed at various finger joints in order to track the posture parameters [24,55]. However, the use of such optical markers is obtrusive [55]

and finding the correspondence between the markers and the relative joints is a major problem [56]. Hence, colour-coded gloves are used as an effective alternative [44, 55, 56]. The colour-coded glove is made of fabric and designed to consist of different colours for every joint and the bone segment of the hand. These colours are used as cues to detect the segments of the hand and the hand posture parameters.

Even though the colour-coded gloves are simple and effective as a vision based interface, ideally it is not desirable for the gesturer to rely on the colour-coded gloves in practical applications. Hence, glove-free and markerless vision based interfaces in which the hand region is extracted from the image are employed.

The commonly employed technique reported in the literature for hand extraction in a vision based interface

1.4 Vision based hand posture recognition: the information processing step

(a) (b) (c) (d) (e)

Figure 1.7: Illustration of different hand posture models. (a) 3D textured volumetric model; (b) 3D wireframe volumetric model; (c) 3D skeletal model; (d) Binary silhouette and (e) Contour. Image courtesy Wikipedia [1].

is through skin colour detection [47]. Some of the methods for hand detection use background subtraction techniques [57, 58], object contours [59] or a combination of the colour and the edge characteristics for hand localization [60].

1.4.2 Hand posture modelling

The detected hand posture can be considered as the configuration of the hand in the 3D space. Hence, the description of the hand posture through information analysis involves the characterization of their spatial properties. The approaches to spatial modelling of the hand postures are the model based approach and the appearance based approach [47, 61].

The model based approach to spatial modelling of the hand involves synthesizing 3D hand models to analyse the hand posture. The important parameters of the model based approach are the angles made by the hand joints and the palm position [61]. The 3D hand models are mainly classified into the volumetric and the skeletal models. The volumetric models describe either the 3D visual appearance or the 3D geometric appearance of the human hand. The geometric appearance in volumetric modelling is achieved through the use of generalised cylinders and superquadrics which encompass cylinders, spheres, ellipsoids and hyper-rectangles. The skeletal models are constructed using simple geometric structures such as the rectangular segments and lines. The illustrations of these different hand posture models are in Figure 1.7.

Unlike the model-based approach, the appearance based approaches are based on the projection of the 3D object into a 2D plane. Therefore, the appearance based models are the 2D images of the hand postures. This implies that the 2D appearance based modelling does not recover the entire hand posture and it results in loss of information in comparison to 3D modelling methods. However, the computational cost in fully recovering the 3D hand posture state is very high for real-time recognition and slight variations in the model parameters greatly affect the system performance. By contrast, processing the 2D appearance based models offers low

computational cost and high accuracy for a modest gesture vocabulary [62]. Thus, the 2D models are well pertinent for real time processing in CBA systems.

1.4.3 Feature extraction

The general approach to deriving the decision function is through analyzing the unique set of visual features that accurately represent the hand postures. The procedure of deriving the features that describe the given object is known as feature extraction and it is one of the most crucial steps that directly influence the performance efficiency of the hand posture based CBA systems. The features employed for describing the hand posture vary depending on the type of hand posture models. In the case of 3D hand models, the direct parameters defining the hand postures such as the joint angles, the palm position, the height and the width of the fingers can be accurately estimated and they form the feature set representing the hand postures [61].

In the appearance based models, the 2D images of the hand postures are used as the templates. The feature describing the hand posture images can be derived either from the spatial domain or the transform domain representation of the binary hand shapes or the gray-level hand images. In the spatial domain representation, the features are directly derived by analyzing the pixel values constituting the hand posture image. The transform domain representation is the projection of the image from spatial domain on another domain in such a way that the distinct characteristics of the image are emphasised. Some of the image properties that are characterised by the extracted features are the spatial distribution of the intensity values, the magnitude and the orientation properties of the image gradients or the edges and the shape. These feature descriptors are derived either from the gray-level images or the binary silhouette images of the hand posture. Some of the other visual features derived from the appearance based models include the geometric features such as the number of extended fingers, their spatial positions and the inclination angles. Among these features, the shape is an important visual feature and it has been successfully used for representing the hand postures. The computational requirements in shape based object analysis is less when compared to processing the gray-level and the colour images.

A large number of features based on the above image properties are reported in the area of hand posture recognition. The efficiency of the extracted features for object recognition is generally evaluated based on the compactness in representation, robustness to spatial transformations, sensitivity to noise, accuracy in classification and the complexity of computation [63]. In this context, the moments are transform domain representations that are known to be efficient for shape representation [64]. Accordingly, some of the robust moments based features reported for hand posture recognition are the geometric moments [65] and the continuous orthogonal Zernike moments [66, 67]. The moment based features are simple, robust and offer compact representations. A

Dalam dokumen hand posture recognition using discrete orthogonal moments (Halaman 53-57)