• Tidak ada hasil yang ditemukan

hand posture recognition using discrete orthogonal moments

N/A
N/A
Protected

Academic year: 2023

Membagikan "hand posture recognition using discrete orthogonal moments"

Copied!
246
0
0

Teks penuh

15 1.8 Illustration of variations in hand pose image detail with respect to illumination. a). Examples of test samples exhibiting higher misclassification with respect to discrete Krawtchouk and Tchebichef moments as features.

Hand gestures in CBA systems

Hand gesture taxonomy

The gesture system with this type of linguistic structure is known as sign language [8]. The next class under communicative gestures is the class of emblems. Unlike sign language, emblems have no linguistic structure and are simply hand gestures with specific meanings [8].

Applicability in CBA

  • Application as user interface data
  • Application as a data cue

They can occur independently of speech and the gestures in this class have standard meanings that clearly replace a spoken word. The application domains of such gestures are robotic systems, avatar animation, interactive gaming and assistance systems. v) The gestures act as a language of communication in automatic sign translation systems.

Significance of hand postures in CBA

Structure and the movements of the hand

The amount of movement varies from joint to joint and the movement of the bone segments is relatively dependent. The details of the movements associated with the joints between the adjacent bone segments are given in Table 1.1.

Figure 1.1: Illustration of anatomy of the human hand explaining the bone segments and the joints of the hand
Figure 1.1: Illustration of anatomy of the human hand explaining the bone segments and the joints of the hand

Hand posture based user interfaces

Sensor based interfaces

The device consists of fiber optic sensors to measure the finger bend and the magnetic sensors to measure the orientation of the hand. 5DT data gloves consist of fiber optic sensors for measuring joint movements of the hand [31].

Figure 1.3: Examples of hand postures to illustrate the variations in the hand shape relative to the anatomical movements of the hand joints
Figure 1.3: Examples of hand postures to illustrate the variations in the hand shape relative to the anatomical movements of the hand joints

Vision based interfaces

In real time, the choice of viewing angle varies with each hand position. The multivision system offers the advantages of accurate reconstruction of hand posture and elimination of occlusions [44].

Merits of vision based interfaces over sensor based interfaces

Thus, to recreate the hand pose accurately, the vision-based interface should either use a moving camera or several still cameras to capture the pose images at different viewing angles. Due to the difficulties associated with the multi-vision systems, the monocular vision-based interfaces are widely used.

Vision based hand posture recognition: the information processing step

  • Hand localization
  • Hand posture modelling
  • Feature extraction
  • Classification

The approaches for spatial modeling of the hand postures are the model-based approach and the appearance-based approach [47, 61]. In the appearance-based models, the 2D images of the hand positions are used as templates.

Figure 1.7: Illustration of different hand posture models. (a) 3D textured volumetric model; (b) 3D wireframe volumetric model; (c) 3D skeletal model; (d) Binary silhouette and (e) Contour
Figure 1.7: Illustration of different hand posture models. (a) 3D textured volumetric model; (b) 3D wireframe volumetric model; (c) 3D skeletal model; (d) Binary silhouette and (e) Contour

Issues in vision based hand posture recognition

Segmentation errors

The hand pose recorded under normal illumination and the corresponding image histogram plot are shown in Figure 1.8(b) and Figure 1.9(b), respectively. Similarly, Figure 1.8(c) is an example of the hand pose image captured under relatively bright light, and the corresponding histogram plot is shown in Figure 1.9(c).

Figure 1.8: Illustration of variations in the details of the hand posture image with respect to illumination changes
Figure 1.8: Illustration of variations in the details of the hand posture image with respect to illumination changes

Geometrical distortions

  • Geometrical transformations
  • Variations in the hand posture parameter
  • Variations due to the angle of view

Structural variations in hand posture are caused by changes in the mobility of the user's hand joints within a defined area. The picture illustrates structural deviations or deviations in the appearance of hand posture.

Figure 1.11: Illustration of hand posture parameters using the hand skeleton. The joint angles represent the hand posture parameters.
Figure 1.11: Illustration of hand posture parameters using the hand skeleton. The joint angles represent the hand posture parameters.

Motivation for the present work

Contributions of the thesis

Organization of the thesis

The descriptor can be derived from the geometric shape in the form of the binary silhouette image obtained by the segmentation of the original image. Alternatively, the descriptor can be derived from the intensity variation in the gray level image containing the object.

Silhouette image based methods

  • Geometric features
  • Curvature scale space
  • Modified Hausdorff distance based matching
  • Fourier descriptors
  • Moments and moment invariants
  • Multi-fusion features

In [78], a hand posture recognition technique was proposed using the boundary profile of hand postures as features. Curvature Scale Space (CSS) representation of handstand is a boundary-based shape description method. In the RLP phase, features were extracted from the multiscale space representation of hand postures.

Figure 2.1: Illustration of smoothing of the shape boundary and the evolution of the inflection points at different scales (σ)
Figure 2.1: Illustration of smoothing of the shape boundary and the evolution of the inflection points at different scales (σ)

Gray-level image based methods

  • Edge-based Features
    • Orientation histograms
    • Hough transform
  • Image transform features
    • DCT features
    • PCA and LDA based features
    • Wavelet transform based descriptors
  • Elastic Graph matching
  • Local spatial pattern analysis
    • Local binary patterns
    • Modified census transform
    • Haar-like features
    • Scale invariant feature transform
  • Local linear embedding

The orientation probability distribution of the gradient gives the orientation histogram of the hand posture. The performance of PCA and LDA features in hand posture classification is studied in detail. Rotational changes of hand positions were normalized based on Gabor wavelet responses.

Figure 2.3: Plots of the real part of the Gabor wavelet kernels G ϑ,θ obtained at 4 scales (P = 4) and 8 orientations (Q = 8).
Figure 2.3: Plots of the real part of the Gabor wavelet kernels G ϑ,θ obtained at 4 scales (P = 4) and 8 orientations (Q = 8).

Summary and conclusion

Some of the DOPs explored for image analysis are the discrete Tchebichef polynomials [71], Krawtchouk polynomials [72], Hahn polynomials, and Racah polynomials [152]. Yap et al [151] have shown that the discrete Hahn polynomials are the generalization of the discrete Tchebichef and Krawtchouk polynomials. It presents the formulations, the spatial and frequency domain properties of the Krawtchouk and the discrete Tchebichef polynomials.

Theory of discrete orthogonal polynomials

The DOPs form the eigenfunctions of the operatorΥif (i) Υ is symmetric with respect to the weight w(x). iii) R is a constant that is assumed to be zero. Due to the above conditions, the eigenfunctions of (3.8) are the polynomialsψn(x) orthogonal to w(x). The discrete Rodrigues formula associated with the DOP solution in (3.15) can be derived as ψn(x)= Bnw (x)−1∇n.

Formulation of the Krawtchouk polynomials

  • Rodrigues formula
  • Recurrence relation
  • Hypergeometric representation
  • Derivation of
  • Weighted Krawtchouk polynomials (WKPs)

It is easy to verify that the Krawtchouk polynomials exhibit symmetry with respect to the parameters n and x [72, 157]. The Krawtchouk polynomial in (3.26) can be written as. 3.34) The hypergeometric representation of the Krawtchouk polynomials is thus given by Using the binomial theorem, the generating function for the Krawtchouk polynomials in (3.40) can be simplified as [155, 160].

Formulation of discrete Tchebichef polynomials (DTPs)

  • Rodrigues formula
  • Recurrence relation
  • Hypergeometric representation
  • Derivation of k T n k 2 w

For n > 1, the trinomial inverse relation for the discrete Tchebichef polynomial can be derived as [71, 154]. It is easy to show from (3.54) that the DTPs are symmetric with respect to x as given by z.

Least squares approximation of functions by DOPs

Image representation using two-dimensional DOPs

Spatial domain behaviour of the DOPs

The parameters p1 and p2 control the polynomial position in the vertical (x-axis) and horizontal (y-axis) directions, respectively. From the illustration, it can also be observed that the spatial support of the polynomial increases in the x direction as the value of n increases.

Figure 3.1: Plots of the WKPs for different values of p and order n. The plots illustrate the translation of K n (x) with respect to the value of p
Figure 3.1: Plots of the WKPs for different values of p and order n. The plots illustrate the translation of K n (x) with respect to the value of p

Frequency domain behaviour of the DOPs

Quantitative analysis

The peak frequencyωpis is the frequency at which the energy of the function is highest. It can be seen from the table that the peak frequencies of the normalized DTPs are relatively smaller than those of the WCPs of the same order. Furthermore, it is also observed that the bandwidth of the normalized DTPs increases with the order.

Short-time Fourier transform (STFT) analysis

The illustration shows that for order n< N2+1, the low-frequency ESD of the polynomial increases for values ​​of x near x=0 and x=N. The length of the sliding window ξ(.) is chosen as 30 and the number of frequency points is 128. The length of the sliding window ξ(.) is chosen as 30 and the number of frequency points is 128.

Figure 3.7: Plots of the 1D WKPs and corresponding ESD obtained using STFT as functions of x
Figure 3.7: Plots of the 1D WKPs and corresponding ESD obtained using STFT as functions of x

Shape approximation using DOPs

Metrics for reconstruction accuracy

The similarity between the compared shapes f and ˆf is high if the corresponding MHD is small. The reconstruction accuracy of the DOMs is quantitatively compared using the values ​​of the SSIM index and the MHD. The performance of the orthogonal moments is analyzed by varying the order of the moments used for the approximation.

Experiments on shape representation

  • Characterizing shapes using curvature properties
  • Spatial scale of the shapes
  • Variation in shapes versus reconstruction accuracy
  • Noise versus reconstruction accuracy

The reconstructed shapes from the Krawtchouk and Tchebichef discrete moments approximation of the noisy shapes are given in Figure 3.18(c). The SSIM and MHD index graphs are shown in Figure 3.19(d) and Figure 3.19(e), respectively. As the order increases, the performance of Krawtchouk moments and discrete Tchebichef moments becomes almost similar.

Figure 3.9: Illustration of finding the concave segments of a shape from the curvature function derived from the corre- corre-sponding shape boundary
Figure 3.9: Illustration of finding the concave segments of a shape from the curvature function derived from the corre- corre-sponding shape boundary

Experiments on shape classification

The distance is measured in terms of the similarity in the spatial distribution of pixels. The comprehensive scores of the classification results obtained for each shape class in the test data are given by the plot in Figure 3.28. The consolidated plot of the classification results obtained for each shape class with respect to the extended training set is given in Figure 3.30.

Figure 3.24: Illustration of undistorted training sample per shape class constituting the reference dataset.
Figure 3.24: Illustration of undistorted training sample per shape class constituting the reference dataset.

Summary

Appendix : Proof for the QMF property of WKP basis

This chapter presents the proposed method and the experimental studies that comparatively validate the DOMs as hand posture features. The hand posture recognition system developed in this work addresses the three main issues in hand shape interpretation. segmentation of the forearm and extraction of hand region. The section on system implementation presents the procedures and techniques involved in realizing the hand posture recognition system.

Figure 4.1: Illustration of a tabletop user interface setup using a top-mounted camera for natural human-computer inter- inter-action through hand postures.
Figure 4.1: Illustration of a tabletop user interface setup using a top-mounted camera for natural human-computer inter- inter-action through hand postures.

Hand posture acquisition and database development

  • Determination of camera position
  • Determination of view-angle
  • System setup
  • Development of Hand posture database

In real time, the optimal position of the camera to achieve hand pose depends on the application. The viewing angle (Cθ) is measured relative to the x−y plane. b) the angle of view variation between the camera and the focus object. The variations in the camera's viewing angle with respect to the hand area are illustrated through Figure 4.4(b).

Figure 4.3: A schematic representation of the experimental setup employed for acquiring the hand posture images.
Figure 4.3: A schematic representation of the experimental setup employed for acquiring the hand posture images.

System Implementation

  • Hand detection and segmentation
  • Normalization techniques
    • Proposed method for rule based hand extraction
    • Proposed approach to orientation correction
    • Normalization of scale and spatial translation
  • Feature Extraction
    • Extraction of moment shape descriptors
    • Extraction of non-moment shape descriptors
  • Classification

Also, the width of a finger is much smaller than the palm and forearm. The width of a finger (maximum EDT value in this section) is much smaller than the palm and forearm. The orientation of the hand can be assumed to imply the orientation of the hand position.

Figure 4.6: Schematic representation of the proposed hand posture recognition technique.
Figure 4.6: Schematic representation of the proposed hand posture recognition technique.

Experimental Studies and Results

Quantitative analysis of hand posture variations

The within-class FOMs standard deviation plot shown in Figure 4.16 shows the variability in Pratt's FOM values ​​relative to each class. The graphs are created by averaging the correlation values ​​obtained with respect to the samples in each attitude class. It is known that the hand consists of palm and finger regions.

Figure 4.15: Intraclass distance measured in terms of Pratt’s FOM for samples in (a) Dataset 1 and (b) Dataset 2
Figure 4.15: Intraclass distance measured in terms of Pratt’s FOM for samples in (a) Dataset 1 and (b) Dataset 2

Experiments on hand posture classification

  • Verification of user independence
  • Verification of view invariance
  • Improving view invariant recognition

The attitudinal classification results of the geometric moments obtained for varying numbers of users in the training set are listed in Tables 4.4(a) - 4.4(c). The samples in both these position classes are inconsistent with position 7. From Table 4.9, it should be noted that the classification accuracy is better for the test samples from Dataset 1. The samples of some of the positions from Dataset 2 with higher misclassification rates are shown in Figure 4.24. The classification results are obtained for 3600 samples containing 2030 samples from Dataset 1 and 1570 samples from Dataset 2. The results are consolidated in Table 4.9.

Figure 4.19: Examples of the hand postures taken from Dataset 1 to form the training set.
Figure 4.19: Examples of the hand postures taken from Dataset 1 to form the training set.

Summary

Therefore, for the successful realization of a vision-based CBA system for Bharatanatyam, it is crucial to develop image processing techniques for efficient description and classification of the hand postures in Bharatanatyam. In a Bharatanatyam dance video, the frames containing the hand positions will be considered the key frames. It can therefore be understood that the primary factor in developing a vision-based CBA system for Bharatanatyam is the recognition of the hand postures in the key frames.

Bharatanatyam and its gestures

Asamyuta hastas - the single-hand postures

The appearance of Asamyuta hastas in Nritta does not convey any meaning and they are used to emphasize the beauty of the dance. The meaning of the hastas and some of the representations evoked through them are given. From the illustration of the Asamyuta hastas, it can be observed that each Asamyuta hasta is formed by obeying certain rules related to the spatial localization of the fingers and the bending angles. in the knuckles.

Hand posture acquisition and database development

Determination of camera position

However, the values ​​of the joint angles are not precisely defined and to some extent variations in the joint angles are allowed depending on the comfort of the dancer and the dancer's hand geometry. These variations will not be noticeable or large in such a way as to change the appearance of the posture. Since the hand postures are formed by complex finger configurations, the Asamyuta hastas in Bharatanatyam are considered complex hand postures.

Figure 5.1: Illustration of different Asamyuta hastas. The indexing as (a) and (b) represents the variations in postures as adapted by different dancers
Figure 5.1: Illustration of different Asamyuta hastas. The indexing as (a) and (b) represents the variations in postures as adapted by different dancers

Determination of view-angle

One quarter left (1/4L): The dancer is in a position halfway between FF and PL. Three-quarter left (3/4L): the dancer is in a position halfway between FB and PL position. Three-quarter right (3/4R): The dancer is in a position halfway between FB and PR position.

System setup

Development of Asamyuta hasta database

The figure illustrates the variations in the use of some hastas, namely Padmakosam, Kangulam and Katakamukham 2. By including the variations in the use of some hastas, this database includes 32 hand positions in the Asamyuta hasta group. The front view of the hand position is obtained at the optimal viewing angle Cθ = 90◦.

Figure 5.4: Illustration of Asamyuta hastas acquired for the database. The figure illustrates the variation in the usage of some of the hastas, namely, the Padmakosam, the Kangulam and the Katakamukham 2
Figure 5.4: Illustration of Asamyuta hastas acquired for the database. The figure illustrates the variation in the usage of some of the hastas, namely, the Padmakosam, the Kangulam and the Katakamukham 2

System implementation

  • Hand segmentation
  • Orientation normalisation
  • Normalisation for scale and translation changes
  • Extraction of DOM features
    • Comparison with other descriptors
  • Classification

The right and left views of the hand positions are obtained by moving the camera to the right and left respectively of the focus object. Therefore, the right and the left views are respectively the directions in which the optical axis of the camera is at an angle of 90◦−θ and 90◦+θ with respect to the object plane. Therefore, the skin color detection method based on the hue and the in-phase color component, as explained in Section 4.3.1, can be used for the segmentation of the hand postures.

Figure 5.5: Schematic representation of the proposed hand posture recognition system.
Figure 5.5: Schematic representation of the proposed hand posture recognition system.

Experimental studies and results

Quantitative analysis on hand posture variations

Experiments on posture classification

  • Verification of user invariance
  • Verification of view invariance
  • Improving view invariant classification

Summary

Suggestions for future research

Introduction

The image obtained at an optimal viewing angle corresponds to the front view of the focus object. When the camera turns to the right from the reference position, the acquired image corresponds to the right side view of the object. Likewise, when the camera turns to the left, the resulting image corresponds to the left side view of the object.

Gambar

Table 1.1: Details of anatomical movements associated with the joints between the bone segments of the hand.
Figure 1.3: Examples of hand postures to illustrate the variations in the hand shape relative to the anatomical movements of the hand joints
Figure 1.4: Sensor based glove interfaces. (a) Dataglove. Image courtesy www.dipity.com; (b) CyberGlove II; (c) Example of hand gesture animation using CyberGlove II
Figure 1.5: Illustration of the monocular vision based interface unit for CBA systems.
+7

Referensi

Dokumen terkait

Comparison of Precision for Oven and Rapid Moisture Analyzer Analysis Methods for Instant-Dried Noodles (GMTK) and Instant-Fried Noodles (GCEPKJ)..