hand posture recognition using discrete orthogonal moments

15 1.8 Illustration of variations in hand pose image detail with respect to illumination. a). Examples of test samples exhibiting higher misclassification with respect to discrete Krawtchouk and Tchebichef moments as features.

Hand gestures in CBA systems

Hand gesture taxonomy

The gesture system with this type of linguistic structure is known as sign language [8]. The next class under communicative gestures is the class of emblems. Unlike sign language, emblems have no linguistic structure and are simply hand gestures with specific meanings [8].

Applicability in CBA

Application as user interface data
Application as a data cue

They can occur independently of speech and the gestures in this class have standard meanings that clearly replace a spoken word. The application domains of such gestures are robotic systems, avatar animation, interactive gaming and assistance systems. v) The gestures act as a language of communication in automatic sign translation systems.

Significance of hand postures in CBA

Structure and the movements of the hand

The amount of movement varies from joint to joint and the movement of the bone segments is relatively dependent. The details of the movements associated with the joints between the adjacent bone segments are given in Table 1.1.

Figure 1.1: Illustration of anatomy of the human hand explaining the bone segments and the joints of the hand

Hand posture based user interfaces

Sensor based interfaces

The device consists of fiber optic sensors to measure the finger bend and the magnetic sensors to measure the orientation of the hand. 5DT data gloves consist of fiber optic sensors for measuring joint movements of the hand [31].

Figure 1.3: Examples of hand postures to illustrate the variations in the hand shape relative to the anatomical movements of the hand joints

Vision based interfaces

In real time, the choice of viewing angle varies with each hand position. The multivision system offers the advantages of accurate reconstruction of hand posture and elimination of occlusions [44].

Merits of vision based interfaces over sensor based interfaces

Thus, to recreate the hand pose accurately, the vision-based interface should either use a moving camera or several still cameras to capture the pose images at different viewing angles. Due to the difficulties associated with the multi-vision systems, the monocular vision-based interfaces are widely used.

Vision based hand posture recognition: the information processing step

Hand localization
Hand posture modelling
Feature extraction
Classification

The approaches for spatial modeling of the hand postures are the model-based approach and the appearance-based approach [47, 61]. In the appearance-based models, the 2D images of the hand positions are used as templates.

Figure 1.7: Illustration of different hand posture models. (a) 3D textured volumetric model; (b) 3D wireframe volumetric model; (c) 3D skeletal model; (d) Binary silhouette and (e) Contour

Issues in vision based hand posture recognition

Segmentation errors

The hand pose recorded under normal illumination and the corresponding image histogram plot are shown in Figure 1.8(b) and Figure 1.9(b), respectively. Similarly, Figure 1.8(c) is an example of the hand pose image captured under relatively bright light, and the corresponding histogram plot is shown in Figure 1.9(c).

Figure 1.8: Illustration of variations in the details of the hand posture image with respect to illumination changes

Geometrical distortions

Geometrical transformations
Variations in the hand posture parameter
Variations due to the angle of view

Structural variations in hand posture are caused by changes in the mobility of the user's hand joints within a defined area. The picture illustrates structural deviations or deviations in the appearance of hand posture.

Figure 1.11: Illustration of hand posture parameters using the hand skeleton. The joint angles represent the hand posture parameters.

Motivation for the present work

Contributions of the thesis

Organization of the thesis

The descriptor can be derived from the geometric shape in the form of the binary silhouette image obtained by the segmentation of the original image. Alternatively, the descriptor can be derived from the intensity variation in the gray level image containing the object.

Silhouette image based methods

Geometric features
Curvature scale space
Modified Hausdorff distance based matching
Fourier descriptors
Moments and moment invariants
Multi-fusion features

In [78], a hand posture recognition technique was proposed using the boundary profile of hand postures as features. Curvature Scale Space (CSS) representation of handstand is a boundary-based shape description method. In the RLP phase, features were extracted from the multiscale space representation of hand postures.

Figure 2.1: Illustration of smoothing of the shape boundary and the evolution of the inflection points at different scales (σ)

Gray-level image based methods

Edge-based Features

Orientation histograms
Hough transform

Image transform features

DCT features
PCA and LDA based features
Wavelet transform based descriptors

Elastic Graph matching
Local spatial pattern analysis

Local binary patterns
Modified census transform
Haar-like features
Scale invariant feature transform

Local linear embedding

The orientation probability distribution of the gradient gives the orientation histogram of the hand posture. The performance of PCA and LDA features in hand posture classification is studied in detail. Rotational changes of hand positions were normalized based on Gabor wavelet responses.

Figure 2.3: Plots of the real part of the Gabor wavelet kernels G ϑ,θ obtained at 4 scales (P = 4) and 8 orientations (Q = 8).

Summary and conclusion

Some of the DOPs explored for image analysis are the discrete Tchebichef polynomials [71], Krawtchouk polynomials [72], Hahn polynomials, and Racah polynomials [152]. Yap et al [151] have shown that the discrete Hahn polynomials are the generalization of the discrete Tchebichef and Krawtchouk polynomials. It presents the formulations, the spatial and frequency domain properties of the Krawtchouk and the discrete Tchebichef polynomials.

Theory of discrete orthogonal polynomials

The DOPs form the eigenfunctions of the operatorΥif (i) Υ is symmetric with respect to the weight w(x). iii) R is a constant that is assumed to be zero. Due to the above conditions, the eigenfunctions of (3.8) are the polynomialsψn(x) orthogonal to w(x). The discrete Rodrigues formula associated with the DOP solution in (3.15) can be derived as ψn(x)= Bnw (x)−1∇n.

Formulation of the Krawtchouk polynomials

Rodrigues formula
Recurrence relation
Hypergeometric representation
Derivation of
Weighted Krawtchouk polynomials (WKPs)

It is easy to verify that the Krawtchouk polynomials exhibit symmetry with respect to the parameters n and x [72, 157]. The Krawtchouk polynomial in (3.26) can be written as. 3.34) The hypergeometric representation of the Krawtchouk polynomials is thus given by Using the binomial theorem, the generating function for the Krawtchouk polynomials in (3.40) can be simplified as [155, 160].

Formulation of discrete Tchebichef polynomials (DTPs)

Rodrigues formula
Recurrence relation
Hypergeometric representation
Derivation of k T n k 2 w

For n > 1, the trinomial inverse relation for the discrete Tchebichef polynomial can be derived as [71, 154]. It is easy to show from (3.54) that the DTPs are symmetric with respect to x as given by z.

Least squares approximation of functions by DOPs

Image representation using two-dimensional DOPs

Spatial domain behaviour of the DOPs

The parameters p1 and p2 control the polynomial position in the vertical (x-axis) and horizontal (y-axis) directions, respectively. From the illustration, it can also be observed that the spatial support of the polynomial increases in the x direction as the value of n increases.

Figure 3.1: Plots of the WKPs for different values of p and order n. The plots illustrate the translation of K n (x) with respect to the value of p

Frequency domain behaviour of the DOPs

Quantitative analysis

The peak frequencyωpis is the frequency at which the energy of the function is highest. It can be seen from the table that the peak frequencies of the normalized DTPs are relatively smaller than those of the WCPs of the same order. Furthermore, it is also observed that the bandwidth of the normalized DTPs increases with the order.

Short-time Fourier transform (STFT) analysis

The illustration shows that for order n< N2+1, the low-frequency ESD of the polynomial increases for values of x near x=0 and x=N. The length of the sliding window ξ(.) is chosen as 30 and the number of frequency points is 128. The length of the sliding window ξ(.) is chosen as 30 and the number of frequency points is 128.

Figure 3.7: Plots of the 1D WKPs and corresponding ESD obtained using STFT as functions of x

Shape approximation using DOPs

Metrics for reconstruction accuracy

The similarity between the compared shapes f and ˆf is high if the corresponding MHD is small. The reconstruction accuracy of the DOMs is quantitatively compared using the values of the SSIM index and the MHD. The performance of the orthogonal moments is analyzed by varying the order of the moments used for the approximation.

Experiments on shape representation

Characterizing shapes using curvature properties
Spatial scale of the shapes
Variation in shapes versus reconstruction accuracy
Noise versus reconstruction accuracy

The reconstructed shapes from the Krawtchouk and Tchebichef discrete moments approximation of the noisy shapes are given in Figure 3.18(c). The SSIM and MHD index graphs are shown in Figure 3.19(d) and Figure 3.19(e), respectively. As the order increases, the performance of Krawtchouk moments and discrete Tchebichef moments becomes almost similar.

Figure 3.9: Illustration of finding the concave segments of a shape from the curvature function derived from the corre- corre-sponding shape boundary

Experiments on shape classification

The distance is measured in terms of the similarity in the spatial distribution of pixels. The comprehensive scores of the classification results obtained for each shape class in the test data are given by the plot in Figure 3.28. The consolidated plot of the classification results obtained for each shape class with respect to the extended training set is given in Figure 3.30.

Figure 3.24: Illustration of undistorted training sample per shape class constituting the reference dataset.

Summary

Appendix : Proof for the QMF property of WKP basis

This chapter presents the proposed method and the experimental studies that comparatively validate the DOMs as hand posture features. The hand posture recognition system developed in this work addresses the three main issues in hand shape interpretation. segmentation of the forearm and extraction of hand region. The section on system implementation presents the procedures and techniques involved in realizing the hand posture recognition system.

Figure 4.1: Illustration of a tabletop user interface setup using a top-mounted camera for natural human-computer inter- inter-action through hand postures.

Hand posture acquisition and database development

Determination of camera position
Determination of view-angle
System setup
Development of Hand posture database

In real time, the optimal position of the camera to achieve hand pose depends on the application. The viewing angle (Cθ) is measured relative to the x−y plane. b) the angle of view variation between the camera and the focus object. The variations in the camera's viewing angle with respect to the hand area are illustrated through Figure 4.4(b).

Figure 4.3: A schematic representation of the experimental setup employed for acquiring the hand posture images.

System Implementation

Hand detection and segmentation
Normalization techniques

Proposed method for rule based hand extraction
Proposed approach to orientation correction
Normalization of scale and spatial translation

Feature Extraction

Extraction of moment shape descriptors
Extraction of non-moment shape descriptors

Classification

Also, the width of a finger is much smaller than the palm and forearm. The width of a finger (maximum EDT value in this section) is much smaller than the palm and forearm. The orientation of the hand can be assumed to imply the orientation of the hand position.

Figure 4.6: Schematic representation of the proposed hand posture recognition technique.

Experimental Studies and Results

Quantitative analysis of hand posture variations

The within-class FOMs standard deviation plot shown in Figure 4.16 shows the variability in Pratt's FOM values relative to each class. The graphs are created by averaging the correlation values obtained with respect to the samples in each attitude class. It is known that the hand consists of palm and finger regions.

Figure 4.15: Intraclass distance measured in terms of Pratt’s FOM for samples in (a) Dataset 1 and (b) Dataset 2

Experiments on hand posture classification

Verification of user independence
Verification of view invariance
Improving view invariant recognition

The attitudinal classification results of the geometric moments obtained for varying numbers of users in the training set are listed in Tables 4.4(a) - 4.4(c). The samples in both these position classes are inconsistent with position 7. From Table 4.9, it should be noted that the classification accuracy is better for the test samples from Dataset 1. The samples of some of the positions from Dataset 2 with higher misclassification rates are shown in Figure 4.24. The classification results are obtained for 3600 samples containing 2030 samples from Dataset 1 and 1570 samples from Dataset 2. The results are consolidated in Table 4.9.

Figure 4.19: Examples of the hand postures taken from Dataset 1 to form the training set.

Summary

Therefore, for the successful realization of a vision-based CBA system for Bharatanatyam, it is crucial to develop image processing techniques for efficient description and classification of the hand postures in Bharatanatyam. In a Bharatanatyam dance video, the frames containing the hand positions will be considered the key frames. It can therefore be understood that the primary factor in developing a vision-based CBA system for Bharatanatyam is the recognition of the hand postures in the key frames.

Bharatanatyam and its gestures

Asamyuta hastas - the single-hand postures

The appearance of Asamyuta hastas in Nritta does not convey any meaning and they are used to emphasize the beauty of the dance. The meaning of the hastas and some of the representations evoked through them are given. From the illustration of the Asamyuta hastas, it can be observed that each Asamyuta hasta is formed by obeying certain rules related to the spatial localization of the fingers and the bending angles. in the knuckles.

Hand posture acquisition and database development

Determination of camera position

However, the values of the joint angles are not precisely defined and to some extent variations in the joint angles are allowed depending on the comfort of the dancer and the dancer's hand geometry. These variations will not be noticeable or large in such a way as to change the appearance of the posture. Since the hand postures are formed by complex finger configurations, the Asamyuta hastas in Bharatanatyam are considered complex hand postures.

Figure 5.1: Illustration of different Asamyuta hastas. The indexing as (a) and (b) represents the variations in postures as adapted by different dancers

Determination of view-angle

One quarter left (1/4L): The dancer is in a position halfway between FF and PL. Three-quarter left (3/4L): the dancer is in a position halfway between FB and PL position. Three-quarter right (3/4R): The dancer is in a position halfway between FB and PR position.

System setup

Development of Asamyuta hasta database

The figure illustrates the variations in the use of some hastas, namely Padmakosam, Kangulam and Katakamukham 2. By including the variations in the use of some hastas, this database includes 32 hand positions in the Asamyuta hasta group. The front view of the hand position is obtained at the optimal viewing angle Cθ = 90◦.

Figure 5.4: Illustration of Asamyuta hastas acquired for the database. The figure illustrates the variation in the usage of some of the hastas, namely, the Padmakosam, the Kangulam and the Katakamukham 2

System implementation

Hand segmentation
Orientation normalisation
Normalisation for scale and translation changes
Extraction of DOM features

Comparison with other descriptors

Classification

The right and left views of the hand positions are obtained by moving the camera to the right and left respectively of the focus object. Therefore, the right and the left views are respectively the directions in which the optical axis of the camera is at an angle of 90◦−θ and 90◦+θ with respect to the object plane. Therefore, the skin color detection method based on the hue and the in-phase color component, as explained in Section 4.3.1, can be used for the segmentation of the hand postures.

Figure 5.5: Schematic representation of the proposed hand posture recognition system.

Experimental studies and results

Quantitative analysis on hand posture variations

Experiments on posture classification

Verification of user invariance
Verification of view invariance
Improving view invariant classification

Summary

Suggestions for future research

Introduction

The image obtained at an optimal viewing angle corresponds to the front view of the focus object. When the camera turns to the right from the reference position, the acquired image corresponds to the right side view of the object. Likewise, when the camera turns to the left, the resulting image corresponds to the left side view of the object.