Hand posture based user interfaces - hand posture recognition using discrete orthogonal moments

From the discussion on the anatomical movements of the hand joints, it is evident that the appearance of a hand shape is based on the angles made by the finger joints. Thus, the cues acquired by the gesture interface device for HCI can be direct measurement of the parameters defining the anatomical motion or they can be visual cues such as the colour, texture, disparity and geometry [20]. The gesture based user interface for HCI is broadly classified as

1. Sensor based interface 2. Vision based interface

A brief outline of these gesture interfaces, their advantages and limitations are discussed as follows.

1.3.1 Sensor based interfaces

The sensor based interfaces are electronic devices that employ sensors to provide information about the motion, the orientation and the position of the fingers to the computer. The key element in a sensor based interface is the hand glove to which the flex sensors, the abduction sensors and the palm-arch sensors are

1.3 Hand posture based user interfaces

Adduction of fingers andFlexion of PIP,MP

Adduction of Thumb

Flexion of DIP,PIP Extensionof MP

Thumb – Flexion of IP, MP

Index finger – Extension of MP Flexion of PIP, DIP

Flexion of all the joints

Adduction - Extension

Thumb – Flexion of MP, TMC

Abduction - Extension

Thumb – Extension

Abduction - Extension

Adduction - Flexion

Thumb – Flexion of MP, TMC

Figure 1.3: Examples of hand postures to illustrate the variations in the hand shape relative to the anatomical movements of the hand joints. Image courtesy wikimedia.org/wiki/File:ABC pict.png

attached [21]. The flex sensors are placed at the finger joints to measure the angular information at the finger joints. The abduction sensors are placed between the adjacent fingers for measuring the abduction angle.

The palm-arch sensors measure the bending of the palm. Along with these, additional sensors such as the magnetic or acoustic sensors are used to measure the relative orientation and the position of the hand in the three dimensional (3D) space [22, 23]. The angular and the positional information measured by the sensors is then passed to the computer through a wired or wireless connection. Such sensor based hand gloves are generally known as the instrumented gloves or the data gloves.

The sensor measurements relative to a hand posture are the cues provided to the information analysis unit.

Depending on the application, the information analysis unit either directly interprets the hand posture or it maps to an animated hand such that it mirrors the shape of the user’s hand posture. There are different types of data gloves that are designed specific to an application. Detailed surveys on the types of data gloves developed so far and their relative applications are given in [22], [24] and [25]. The design of a data glove varies based on the sensor technology, the number of sensors and the sensor precision [22]. The types of sensors used in the instrumented gloves include the accelerometer, conductive pads, Hall effect sensors, capacitive bend sensors, piezo-sensitive sensors, resistive ink sensors and the fiber optic sensors.

The Sayre glove is the first instrumented glove developed by Thomas DeFanti and Daniel Sandin in 1977.

(a) (b) (c)

(d) (e) (f)

Figure 1.4: Sensor based glove interfaces. (a) Dataglove. Image courtesy www.dipity.com; (b) CyberGlove II; (c) Example of hand gesture animation using CyberGlove II. Copyright c2011 CyberGlove Systems LLC All rights reserved;

(d) 5DT data glove. Image courtesy www.5dt.com; (e) Humanglove. Image courtesy Humanware (www.hmw.it) and (f) Pinch glove. Image courtesy Fakespace Labs (www.fakespacelabs.com).

The glove consists of light based sensors to measure the finger flexion and it was designed for multidimensional control of sliders and other two dimensional (2D) widgets [24]. The digital data entry glove developed in 1983 by the Bell telephone laboratories is the first to be designed for manual data entry using single-hand postures in sign language [26]. The glove consists of optical sensors for measuring the finger flexion, conductive pads for sensing proximity, tilt and inertial sensors for measuring the orientation and the position of the hand respectively. In 1987, Zimmerman et al. [27] developed the DataGlove for manipulating 3D virtual objects with hand gestures. The device consists of fibre optic sensors to measure the finger flexion and the magnetic sensors to measure the orientation of the hand. The DataGlove is the first commercially successful device that has been widely used.

James Kramer developed the cyberglove in 1991 to translate the American sign language to spoken English [28]. The cyberglove was commercialised by the virtual technologies and it is one of the leading instrumented gloves in terms of accuracy [22]. The cyberglove consists of piezo-sensitive sensors to measure the flexion, abduction and adduction at the finger joints and the wrist [29].

5DT data glove is another successful glove system developed by the fifth dimension technologies [30]. The

1.3 Hand posture based user interfaces

User Camera

Computer

Hand gesture Hand gesture

Figure 1.5: Illustration of the monocular vision based interface unit for CBA systems.

5DT data gloves consist of fiber optic sensors for measuring the joint movements of the hand [31]. Similarly, the other commercially available glove systems are the Humanglove [32] and the pinch glove [33]. The Hu- manglove consists of Hall effect sensors to measure the joint movements [34] and the pinch glove consists of two or more electrical contacts placed at specific parts of the hands. When a hand posture is made, the electrical contacts meet to complete a conductive path [34].

These sensor based glove interfaces facilitate accurate interpretation or mapping of the hand postures and hence, they find wide applications in sign-to-speech/text translation systems [35, 36], animation [37–39] and virtual reality [40–42].

1.3.2 Vision based interfaces

The vision based interfaces for CBA involve acquisition of hand postures using one or more cameras that are connected to the computer [43]. The vision based system using single camera is referred as the monocular vision system and that with multiple camera is referred as the multi-vision system. The schematic diagram of a monocular vision based interface setup for CBA systems is shown in Figure 1.5.

Unlike the sensor based interface, a computer vision method does not permit direct measurement of the hand posture parameters and hence, the images of the hand postures are the only cues provided to the information analysis unit. The information analysis unit employs image processing techniques for modelling and estimating the hand postures from the acquired hand posture image. The key factor in vision based interface is to ensure sufficient visibility such that the hand posture and the parameters pertaining to it are properly defined to the computer [44]. Accordingly, the camera’s angle of view with respect to the user’s hand should be chosen in such

a way that there is no self-occlusion between the fingers and the shape of the hand is accurately captured [44]. In real time, the choice of the angle of view varies with respect to every hand posture. Thus, in order to accurately recover the hand posture, the vision based interface should either employ one moving camera or multiple still cameras for capturing the posture images at different angles of view. However, the choice of one moving camera is not a feasible solution in most of the practical applications of CBA. Hence, multiple cameras are placed at different angles of view to accurately capture the hand posture [45]. Bebis et al. [46], have employed one moving camera and multiple still cameras for HCI in virtual environments.

The multi-vision system offers the advantages of the accurate reconstruction of hand posture and the elim- ination of occlusion [44]. As a result the multi-vision systems are successful in higher-end applications like robotics, virtual reality, 3D object manipulation and animation. Despite these advantages, the multi-vision based interface is resource-intensive and requires computationally complex algorithms for hand pose estimation [44, 47]. Due to the difficulties associated with the multi-vision systems, the monocular vision based interfaces are widely employed.

In a monocular vision based system, the hand postures are acquired using one camera and the visual fea- tures extracted through image processing techniques are used for the interpretation of hand postures. Several researchers have already shown that the hand posture image acquired at one angle of view is accurate and effective for HCI. Further, the development of estimation methods [48] for 3D reconstruction from a 2D image encourages the use of monocular vision based interface in high-end applications. Accordingly, several estimation methods are being proposed for reconstructing 3D hand postures from the corresponding 2D images [49–54]. The reduced computational complexity and the availability of image processing algorithms for accurate modelling and interpretation of 2D images make the monocular vision based interface more suitable for real time CBA systems.

1.3.3 Merits of vision based interfaces over sensor based interfaces

The choice of the type of interface depends on the requirements of the CBA system such as accuracy, the size of the gesture vocabulary, ease while interaction and adaptability. In this context, the sensor based interfaces facilitate precise estimation of posture parameters and modelling/interpretation of hand postures [47].

As a result, the sensor based interfaces are capable of accurately interpreting a large class of gesture vocabulary that includes hand postures with minor differences.

Despite the advantages, the sensor based glove interfaces are obtrusive and they hinder the naturalness of the user interacting with the computer. They are not user adaptive and it requires to calibrate the device with

Dalam dokumen hand posture recognition using discrete orthogonal moments (Halaman 48-53)