15 1.8 Illustration of variations in hand pose image detail with respect to illumination. a). Examples of test samples exhibiting higher misclassification with respect to discrete Krawtchouk and Tchebichef moments as features.
Hand gestures in CBA systems
Hand gesture taxonomy
The gesture system with this type of linguistic structure is known as sign language [8]. The next class under communicative gestures is the class of emblems. Unlike sign language, emblems have no linguistic structure and are simply hand gestures with specific meanings [8].
Applicability in CBA
- Application as user interface data
- Application as a data cue
They can occur independently of speech and the gestures in this class have standard meanings that clearly replace a spoken word. The application domains of such gestures are robotic systems, avatar animation, interactive gaming and assistance systems. v) The gestures act as a language of communication in automatic sign translation systems.
Significance of hand postures in CBA
Structure and the movements of the hand
The amount of movement varies from joint to joint and the movement of the bone segments is relatively dependent. The details of the movements associated with the joints between the adjacent bone segments are given in Table 1.1.
Hand posture based user interfaces
Sensor based interfaces
The device consists of fiber optic sensors to measure the finger bend and the magnetic sensors to measure the orientation of the hand. 5DT data gloves consist of fiber optic sensors for measuring joint movements of the hand [31].
Vision based interfaces
In real time, the choice of viewing angle varies with each hand position. The multivision system offers the advantages of accurate reconstruction of hand posture and elimination of occlusions [44].
Merits of vision based interfaces over sensor based interfaces
Thus, to recreate the hand pose accurately, the vision-based interface should either use a moving camera or several still cameras to capture the pose images at different viewing angles. Due to the difficulties associated with the multi-vision systems, the monocular vision-based interfaces are widely used.
Vision based hand posture recognition: the information processing step
- Hand localization
- Hand posture modelling
- Feature extraction
- Classification
The approaches for spatial modeling of the hand postures are the model-based approach and the appearance-based approach [47, 61]. In the appearance-based models, the 2D images of the hand positions are used as templates.
Issues in vision based hand posture recognition
Segmentation errors
The hand pose recorded under normal illumination and the corresponding image histogram plot are shown in Figure 1.8(b) and Figure 1.9(b), respectively. Similarly, Figure 1.8(c) is an example of the hand pose image captured under relatively bright light, and the corresponding histogram plot is shown in Figure 1.9(c).
Geometrical distortions
- Geometrical transformations
- Variations in the hand posture parameter
- Variations due to the angle of view
Structural variations in hand posture are caused by changes in the mobility of the user's hand joints within a defined area. The picture illustrates structural deviations or deviations in the appearance of hand posture.
Motivation for the present work
Contributions of the thesis
Organization of the thesis
The descriptor can be derived from the geometric shape in the form of the binary silhouette image obtained by the segmentation of the original image. Alternatively, the descriptor can be derived from the intensity variation in the gray level image containing the object.
Silhouette image based methods
- Geometric features
- Curvature scale space
- Modified Hausdorff distance based matching
- Fourier descriptors
- Moments and moment invariants
- Multi-fusion features
In [78], a hand posture recognition technique was proposed using the boundary profile of hand postures as features. Curvature Scale Space (CSS) representation of handstand is a boundary-based shape description method. In the RLP phase, features were extracted from the multiscale space representation of hand postures.
Gray-level image based methods
- Edge-based Features
- Orientation histograms
- Hough transform
- Image transform features
- DCT features
- PCA and LDA based features
- Wavelet transform based descriptors
- Elastic Graph matching
- Local spatial pattern analysis
- Local binary patterns
- Modified census transform
- Haar-like features
- Scale invariant feature transform
- Local linear embedding
The orientation probability distribution of the gradient gives the orientation histogram of the hand posture. The performance of PCA and LDA features in hand posture classification is studied in detail. Rotational changes of hand positions were normalized based on Gabor wavelet responses.
Summary and conclusion
Some of the DOPs explored for image analysis are the discrete Tchebichef polynomials [71], Krawtchouk polynomials [72], Hahn polynomials, and Racah polynomials [152]. Yap et al [151] have shown that the discrete Hahn polynomials are the generalization of the discrete Tchebichef and Krawtchouk polynomials. It presents the formulations, the spatial and frequency domain properties of the Krawtchouk and the discrete Tchebichef polynomials.
Theory of discrete orthogonal polynomials
The DOPs form the eigenfunctions of the operatorΥif (i) Υ is symmetric with respect to the weight w(x). iii) R is a constant that is assumed to be zero. Due to the above conditions, the eigenfunctions of (3.8) are the polynomialsψn(x) orthogonal to w(x). The discrete Rodrigues formula associated with the DOP solution in (3.15) can be derived as ψn(x)= Bnw (x)−1∇n.
Formulation of the Krawtchouk polynomials
- Rodrigues formula
- Recurrence relation
- Hypergeometric representation
- Derivation of
- Weighted Krawtchouk polynomials (WKPs)
It is easy to verify that the Krawtchouk polynomials exhibit symmetry with respect to the parameters n and x [72, 157]. The Krawtchouk polynomial in (3.26) can be written as. 3.34) The hypergeometric representation of the Krawtchouk polynomials is thus given by Using the binomial theorem, the generating function for the Krawtchouk polynomials in (3.40) can be simplified as [155, 160].
Formulation of discrete Tchebichef polynomials (DTPs)
- Rodrigues formula
- Recurrence relation
- Hypergeometric representation
- Derivation of k T n k 2 w
For n > 1, the trinomial inverse relation for the discrete Tchebichef polynomial can be derived as [71, 154]. It is easy to show from (3.54) that the DTPs are symmetric with respect to x as given by z.
Least squares approximation of functions by DOPs
Image representation using two-dimensional DOPs
Spatial domain behaviour of the DOPs
The parameters p1 and p2 control the polynomial position in the vertical (x-axis) and horizontal (y-axis) directions, respectively. From the illustration, it can also be observed that the spatial support of the polynomial increases in the x direction as the value of n increases.
Frequency domain behaviour of the DOPs
Quantitative analysis
The peak frequencyωpis is the frequency at which the energy of the function is highest. It can be seen from the table that the peak frequencies of the normalized DTPs are relatively smaller than those of the WCPs of the same order. Furthermore, it is also observed that the bandwidth of the normalized DTPs increases with the order.
Short-time Fourier transform (STFT) analysis
The illustration shows that for order n< N2+1, the low-frequency ESD of the polynomial increases for values of x near x=0 and x=N. The length of the sliding window ξ(.) is chosen as 30 and the number of frequency points is 128. The length of the sliding window ξ(.) is chosen as 30 and the number of frequency points is 128.
Shape approximation using DOPs
Metrics for reconstruction accuracy
The similarity between the compared shapes f and ˆf is high if the corresponding MHD is small. The reconstruction accuracy of the DOMs is quantitatively compared using the values of the SSIM index and the MHD. The performance of the orthogonal moments is analyzed by varying the order of the moments used for the approximation.
Experiments on shape representation
- Characterizing shapes using curvature properties
- Spatial scale of the shapes
- Variation in shapes versus reconstruction accuracy
- Noise versus reconstruction accuracy
The reconstructed shapes from the Krawtchouk and Tchebichef discrete moments approximation of the noisy shapes are given in Figure 3.18(c). The SSIM and MHD index graphs are shown in Figure 3.19(d) and Figure 3.19(e), respectively. As the order increases, the performance of Krawtchouk moments and discrete Tchebichef moments becomes almost similar.
Experiments on shape classification
The distance is measured in terms of the similarity in the spatial distribution of pixels. The comprehensive scores of the classification results obtained for each shape class in the test data are given by the plot in Figure 3.28. The consolidated plot of the classification results obtained for each shape class with respect to the extended training set is given in Figure 3.30.
Summary
Appendix : Proof for the QMF property of WKP basis
This chapter presents the proposed method and the experimental studies that comparatively validate the DOMs as hand posture features. The hand posture recognition system developed in this work addresses the three main issues in hand shape interpretation. segmentation of the forearm and extraction of hand region. The section on system implementation presents the procedures and techniques involved in realizing the hand posture recognition system.
Hand posture acquisition and database development
- Determination of camera position
- Determination of view-angle
- System setup
- Development of Hand posture database
In real time, the optimal position of the camera to achieve hand pose depends on the application. The viewing angle (Cθ) is measured relative to the x−y plane. b) the angle of view variation between the camera and the focus object. The variations in the camera's viewing angle with respect to the hand area are illustrated through Figure 4.4(b).
System Implementation
- Hand detection and segmentation
- Normalization techniques
- Proposed method for rule based hand extraction
- Proposed approach to orientation correction
- Normalization of scale and spatial translation
- Feature Extraction
- Extraction of moment shape descriptors
- Extraction of non-moment shape descriptors
- Classification
Also, the width of a finger is much smaller than the palm and forearm. The width of a finger (maximum EDT value in this section) is much smaller than the palm and forearm. The orientation of the hand can be assumed to imply the orientation of the hand position.
Experimental Studies and Results
Quantitative analysis of hand posture variations
The within-class FOMs standard deviation plot shown in Figure 4.16 shows the variability in Pratt's FOM values relative to each class. The graphs are created by averaging the correlation values obtained with respect to the samples in each attitude class. It is known that the hand consists of palm and finger regions.
Experiments on hand posture classification
- Verification of user independence
- Verification of view invariance
- Improving view invariant recognition
The attitudinal classification results of the geometric moments obtained for varying numbers of users in the training set are listed in Tables 4.4(a) - 4.4(c). The samples in both these position classes are inconsistent with position 7. From Table 4.9, it should be noted that the classification accuracy is better for the test samples from Dataset 1. The samples of some of the positions from Dataset 2 with higher misclassification rates are shown in Figure 4.24. The classification results are obtained for 3600 samples containing 2030 samples from Dataset 1 and 1570 samples from Dataset 2. The results are consolidated in Table 4.9.
Summary
Therefore, for the successful realization of a vision-based CBA system for Bharatanatyam, it is crucial to develop image processing techniques for efficient description and classification of the hand postures in Bharatanatyam. In a Bharatanatyam dance video, the frames containing the hand positions will be considered the key frames. It can therefore be understood that the primary factor in developing a vision-based CBA system for Bharatanatyam is the recognition of the hand postures in the key frames.
Bharatanatyam and its gestures
Asamyuta hastas - the single-hand postures
The appearance of Asamyuta hastas in Nritta does not convey any meaning and they are used to emphasize the beauty of the dance. The meaning of the hastas and some of the representations evoked through them are given. From the illustration of the Asamyuta hastas, it can be observed that each Asamyuta hasta is formed by obeying certain rules related to the spatial localization of the fingers and the bending angles. in the knuckles.
Hand posture acquisition and database development
Determination of camera position
However, the values of the joint angles are not precisely defined and to some extent variations in the joint angles are allowed depending on the comfort of the dancer and the dancer's hand geometry. These variations will not be noticeable or large in such a way as to change the appearance of the posture. Since the hand postures are formed by complex finger configurations, the Asamyuta hastas in Bharatanatyam are considered complex hand postures.
Determination of view-angle
One quarter left (1/4L): The dancer is in a position halfway between FF and PL. Three-quarter left (3/4L): the dancer is in a position halfway between FB and PL position. Three-quarter right (3/4R): The dancer is in a position halfway between FB and PR position.
System setup
Development of Asamyuta hasta database
The figure illustrates the variations in the use of some hastas, namely Padmakosam, Kangulam and Katakamukham 2. By including the variations in the use of some hastas, this database includes 32 hand positions in the Asamyuta hasta group. The front view of the hand position is obtained at the optimal viewing angle Cθ = 90◦.
System implementation
- Hand segmentation
- Orientation normalisation
- Normalisation for scale and translation changes
- Extraction of DOM features
- Comparison with other descriptors
- Classification
The right and left views of the hand positions are obtained by moving the camera to the right and left respectively of the focus object. Therefore, the right and the left views are respectively the directions in which the optical axis of the camera is at an angle of 90◦−θ and 90◦+θ with respect to the object plane. Therefore, the skin color detection method based on the hue and the in-phase color component, as explained in Section 4.3.1, can be used for the segmentation of the hand postures.
Experimental studies and results
Quantitative analysis on hand posture variations
Experiments on posture classification
- Verification of user invariance
- Verification of view invariance
- Improving view invariant classification
Summary
Suggestions for future research
Introduction
The image obtained at an optimal viewing angle corresponds to the front view of the focus object. When the camera turns to the right from the reference position, the acquired image corresponds to the right side view of the object. Likewise, when the camera turns to the left, the resulting image corresponds to the left side view of the object.