Publication Repository Artikel Thailand

(1)

(2)

Augmented Reality 3D Eyeglasses

Frame Simulator Using Active Shape Model

and Real Time Face Tracking

Endang Setyati, Yosi Kristian, Yuliana Melita Pranoto Department of Information Technology Sekolah Tinggi Teknik Surabaya (STTS)

Surabaya, Indonesia.

[email protected], [email protected], [email protected]

David Alexandre

Department of Industrial Management National Taiwan University of Science and Technology

Taipei City, Taiwan

[email protected]

Abstract—Combination of real-world object and computer-generated object know as Augmented Reality (AR) is considered media of the future. In AR the computer-generated object is a result of a three-dimensional graphics rendering. An AR software designed to provide real-time interactivity with the user in forms of video. In this paper we implement AR technology to simulate eyeglasses frame model simulation so user can try many eyeglasses frame models and see how they look without actually have to try the eyeglasses. After activating this software, users in the webcam’s range will be processed automatically, a 3D eyeglasses frame model will be positioned in the user’s face. To develop this software we must first implement a fast and effective face tracking algorithm so user can feel the AR as a real object. The speed of face detection or face tracking used in this software will have a significant effect on user experience. On the other hand the higher the video resolution the slower the face tracking algorithm will be. In this paper we use a medium video resolution (640 x 480) so we can increase the speed of face tracking algorithm to make the frame per second user experience acceptable. This simulation is an example of an effective AR technology usage.

Keywords—Technology Innovation, Augmented Reality, Face Tracking, Eyeglasses Frame Simulator.

I. INTRODUCTION

Face tracking (real time face detection) algorithm is one of the fast-growing subjects in the computer vision. Numerous algorithm and methods have been applied to accomplish a fast and robust face tracking system. A good face tracking system is able to track human faces on video stream with acceptable frame rates and to detect only human face, not a human alike face.

An AR system supplements the real world with virtual (computer-generated) objects that appear to coexist in the same space as the real world [1]. In AR implementation generally, the media used is a video camera and a marker. By determining the position detected form the marker, AR system can position the computer generated object to look like a real object in real life.

One of a significant breakthrough is in two dimensional face processing. The result of this research is a possibility to process a two dimensional (2D) face data into a three

II. FACE TRACKING ON VIDEO

A. Augmented Reality

In concept, AR is a real world condition which modified by computer into a new condition. Input used in AR varies from sound, video, image, to GPS. In the larger scale, AR need various sensors and special environment. The way of AR working is involving an input which is a position or coordinates in real environment and output which is a generated object from AR calculation. The AR generated object is a result from a 3D object.

Generally AR can be divided into two groups, marker-less tracking and marker-based tracking. Marker-based detection on augmented reality will involves marker as a basis for defining input. Therefore, one of the algorithms used in this simulation will be a marker tracking algorithm for finding a marker type and position for generating AR object with accurate position. The result is a picture from 3D object which its position adjusted based on the marker position. Of course, the user is required to have the marker mentioned above for activating AR object.

In marker-less tracking, AR defines its input by using algorithms such as object tracking, hand tracking, or face tracking. The result for these algorithms will be used to define the position of AR generated object. In its development, various methods and algorithms have been used for generating more accurate and better AR. In addition, there are also a significant enhancement in the supporting devices, such as glasses for watching the AR result directly, etc.

B. Haar-Like Feature

Haar-like features are digital image features used in object recognition. This method works by changing a part of image (Region of Interest) into a value. This method is one of the common methods for face detection with high speed and accuracy. They owe their name to their intuitive similarity with Haar wavelets and commonly used in the real-time face detector.

(3)

B next step wi h pixel area. T h area for an o

lementation fo

F

Active Shape M

A landmark re he images und right eye pu ting landmark

A set of landm vectors of poi

larity transfo tion) that m ween shape p

ned training dmarked faces)

The ASM star pe aligned to t a global face s until conver

(i) Suggest hape points b und each point

) Conform t The individ shape mod matchers to

Active Shape del is iterativel he image [6]. T uired from a n

Given a guess matched with ameter b for P del can be defi

object. Instanc defining positi

X

re: Xc(Xc,Y

ree and scale n image’s fram

ar-like featur mage into pix ll be to calcu The resulting object detectio or face detecti

Figure 1. Haar Lik

Model

epresents a dis der observatio upil [2]. We ks.

marks forms a ints. We align form (allowin minimizes the

oints. The m shapes (whic ).

rts the search the position a detector. It t gence

a tentative sh by template m t

the tentative s dual template del pools the o form a stron

Model (ASM ly changed to This method umber of train

ed position in h the image.

Point Distribu fined in a coor

ce X in the im

with value of me model.

res working xel area based ulate the inten difference is u n. Example o ion can be see

ke Implementatio

stinguishable p on, for examp

can locate f

a shape. Shap n one shape ng translatio e average eu mean shape is

ch in our ca

h for landmark and size of th then repeats t

hape by adjus matching of

shape to a glo e matches are e results of th nger overall cl

M) [5] is a m o match that m

is using a flex ning data sam

n a picture, AS By choosing ution Model, rdinate frame mage’s model c

n, and scale.

used to categ f Haar-like fe en on Fig.1.

on.

point exist in ple, the locatio

facial feature

pes are represe to another w on, scaling,

uclidean dist the mean o ase are manu

ks from the m he face determ

the following

sting the loca the image tex

obal shape m unreliable an he weak tem lassifier.

method where model into a m xible model w mple.

SM iteratively g a set of s the shape o which center can be constru

is a rotation w s a center pos

Locating a bet the image; Updating par according the Deciding cons shape (ex: |bi| Repeat the st (convergent m between an ite

In practice, th nt for a better

the best matc y to get better ch have the h ng the profile. del’s point. Ev ng edges w istical model p

To get better h Gaussian Py zed and it is s ulate the ASM h different siz

he best result.

It should be c ays located i

se points can ge structure. T e found in figu

Pose from Ort

Pose from O SIT [7] is an a ition of particu m Dementhon

ect, at least fo required. The tion matrix a

rix R from

rdinates from mes from an ob

tive Shape M

tter position f

ameters (Xb, new position straints on pa

< 3√λi); teps until con means there eration and the

he iteration wil position and ch to these ne r position is t highest intens The location ven though, th which combin

point.

result, the Ac yramid. With

stored as temp M result from ze and then im

considered tha n highest int

be representin The best appr ure 2.

Figure 2. ASM

tography and S

Orthography a algorithm whi ular object in

on year 1995 our non-coplan erefore, a 3D

found will be ng perspectiv ction from sca otation matrix

e purpose of and translation

an object i three vectors bject coordina

Model works w

for points in ar

Yb, s, θ, b) found in step arameter b to

nvergent cond are no sign e iteration bef

ll look in the i update the m ew found posi to acquire the sity (with orie of this edge is he best locatio

ned by the

ctive Shape M subsampling, porary data. T m one image mage with the

at a point in tensity edge ng lower inten roach is by an

Implementation.

Scaling with I

and Scaling ich can be use

3D field. This [8]. For meas nar points in

projection fie e used for this ve projection

alated object. x and transla

f this algorith n vector of a

s a matrix s i, j, and k. T ate system whi

with steps suc

round the poin

shape and p 1;

ensure a matc

dition is achi nificant differ fore).

image around model paramete

ition. The sim e location of entation if kn s a new locati on should be o references

Model is comb the image w The next step

to another im e best location

the model do in local struc nsity edge or nalyzing what

Iteration

with Iteratio ed for estimat s algorithm or suring a pose the image’s o eld from 2D p algorithm’s i based on o The result o ation vector o

(4)

a ca rix is to calcu rdinate system Mi, i between f

te a vector pr rdinate. For e g M0Mi coord rdinate system

re iu, iv, iw ar u, M0v, M0w)

The rotation m and j in the ted from a slation vector and object r slation vector

Image 2 sh ogonal projec nt Mi which ha

The steps for P Calculating an

-multiply 3x( calculate I -calculate i an

i Recalculate th -Calculate k v -Calculate z vector with Z -Calculate:



nate system. ulate a camera

m such as M first row of th rojection to i v example, Xi –

re i coordinat . ction which a ave a referenc

POSIT algorit n object featu

1 0 0)  , (i

atively i, j, an ’ image vecto

1

(N-1) B matrix

and J vec

= B.X’ dan J scale from pr r normalization vector from th coordinate fr Z0 = f / s wher a coordinate sy

M0Mi the mu he matrix and

vector which – X0 coordina vector row loc gives result su



constructed f dinate system.

thm follow: ure points wh

1 

 ), n N ...

nd Z0:

or which N-1

1

vector with

N-1

0 

y ,(i ...N )

x with N-1 co ctors which

= B.Y’ rojection as a n. he cross produ from Z0 whic ystem from o ultiplication

M0Mi vector, located in ca ate from M0M

cated in the s ch as these:

coordinate sy

from a calcul . k vector wi j. While for n projection p e coordinates

1 coordinate

1

1 

), n

-1 coordinate

1

1 

), n N

oordinate vect consist of t

an average fro

2 ra’s focal leng

uld located in object projecti orithm.

Figu

Head Pose Est

The estimatio lication will b ive Shape Mo d for face dete ning the user’ d for acquiring face feature del.

ure 4. Points of Fa

In an augme mbination betw real world tured using we generated us

)

n1 is greate t step 3. han the thresh s value. tion vector is

trix with i, j, a

considered t the beginning on is shown i

ure 3. The result f

timation

on of user’s h be using these

odel, and PO ection, Active ’s face point, g a user’s face points which

ace Feature Used

III. RESULT A

ented reality a ween a real wo

condition wh ebcam. While sing compute

er than the thr

hold then disp

s OM0 = Om

and k vector r

that the objec g of coordinat

n the shape of

from POSIT algo

head pose [9 e algorithms: H

SIT. Haar-lik e Shape Mode and POSIT e. Poses will b h generated

for Projection in

AND TESTING

application, the orld condition

here the user e the virtual w er graphics.

reshold then n

play the Pose

m0/s and rot

ows.

ct reference te axis. On fig f cube with PO

orithm.

] when using Haar-like feat ke features wi el will be use algorithm wi be calculated b by Active S

n POSIT Algorithm

e user will wa n and virtual w r located wi world condition

(5)

happen in combining both of this world will involving several algorithms mentioned before.

In this application the user will have to positioned in front of the webcam and activate it. After that, the display of user’s face will be automatically displayed with virtual eyeglasses frame. With the help of webcam, this simulator software will be able to operate real time just like a realistic mirror for user. There are several things that acquired after the testing is conducted.

A. Face Detection and ASM Accuracy

In the next testing, it will be seen that face tracking using Active Shape Model which used in the software will have significant speed change based on input image resolution.

By several testing with different input image resolution and same proporsional resolution, it is concluded that low resolution image will increase the speed of the software and the number of frame per second for the output. Table 1 shows the fps with different resolution.

TABLE I. FRAME PER SECOND COMPARISON

Resolution Frame Per Second (FPS)

320 x 240 15 – 18

480 x 360 12 – 15

640 x 480 9 – 11

800 x 600 6 – 8

The number of iteration for Active Shape Model process to get the best point location not depends on the image size itself. From several testing, it is known that the size from the input image does not have a significant impact with the number of iteration. It is because of the effect of subsampling that used in the face tracking process.

For the example, the number of iteration from image with 640x480 resolution is not too differ than image with 480x360 resolution. Table 2 shows the effect of resolution to the numbers of iteration.

TABLE II. NUMBER OF ITERATION COMPARISON.

Resolution Number of iteration

320 x 240 10-16

480 x 360 12-15

640 x 480 11-17

800 x 600 14-20

The accuracy of Active Shape Model depends on several factors, such as brightness, image sharpness, and noise. For brightness, it is known that the image brightness intensity will affect the accuracy of detection. The comparison between brightness intensity and active shape model result will be shown in figure 5.

Through testing, it is found out that the best brightness for this face detection is an average one. A high brightness or low brightness will decrease the accuracy of the face detection.

Figure 5. Face Detection in Several Types of Brightness.

The next testing will be involved with object in image. Often, there are several objects in the image that can block the face. With Active Shape Model algorithm then the face detection will have increased accuracy even if parts of the face is blocked by object. It should be remembered that the maximum face point feature that blocked by objects is ¼. The result of the face detection can be seen in figure 6.

Figure 6. Face Detection in Partly Blocked Face.

B. Head Pose Estimation Accuracy

The testing for the user’s head pose is using POSIT algorithm which resulted in rotation matrix and translation vector as the output. The result from POSIT algorithm is acceptable even though it is not precise since the feature face points detection with Active Shape Model is not constant. Therefore, the output for the user’s head pose is not precise.

(6)

By testing, the simulator software developed is acceptable in the terms of speed and accuracy. Eyeglasses frame provided in the simulator is constructed using OpenGL and located in the correct place according the face detection result. The pose shown before is created by masking methods. The result of the application can be shown in figure 8.

Figure 7. User’s Head Pose Estimation

Figure 8. The result of eyeglasses frame simulator software.

IV. CONCLUSION

By analyze the implementation of AR, several things can be concluded:

1. Augmented Reality application should be supported by correct algorithm since it is commonly requires real-time processing.

2. Active shape model algorithm have a particular error level because shape model factor used and algorithm’s efficiency.

3. The choice of correct input image in the implementation of active shape model will have a significant impact on the output. Filtering and the environment where the image is taken are important factors for achieving best results in the active shape model implementation.

4. The quality of POSIT algorithm’s output will have a high dependency on the quality of its input image.

5. The user movement speed have great impact on frame positioning accuracy because of the ASM and POSIT processing delay.

6. For high resolution video ASM and POSIT will need more time to process and cause some delay.

ACKNOWLEDGMENT

This work was supported and fully funded by Directorate General of Higher Education, Ministry of Education and Culture of Indonesia.

REFERENCES

[1] Ronald Azuma, Recent Advances in Augmented Reality, IEEE Transaction on Computer Graphics and Application, 1997.

[2] Stephen Milborrow and Fred Nicolls, Locating Facial Features with an Extended Active Shape Model, Computer Vision ECCV, Springer Berlin Heidelberg, 2008.

[3] Paul Viola, Michael J. Jones, Rapid Object Detection using Boosted Cascade of Simple Features, IEEE CVPR, 2001.

[4] T. F. Cootes and C.J. Taylor. Statistical Models of Appearance for Computer Vision. University of Manchester, 2004.

[5] T. F. Cootes, C.J. Taylor, D.H. Cooper, J. Graham. Active Shape Models –Their Training and Application, 1995.

[6] T. F. Cootes, Active Shape Models – Smart Snakes, British Machine Vision Conference, 1992.

[7] Marco Treiber, An Introduction to Object Recognition Selected Algorithms for a Wide Variety of Applications, Springer London Dordrecht Heidelberg New York, ISSN 1617-7916 ISBN 978-1-84996-234-6 e-ISBN 978-1-84996-235-3 DOI 10.1007/978-1-84996-235-3, 2010.

[8] D. Dementhon and Larry S. D., Model-Based Object Pose in 25 Lines of Code, International Journal of Computer Vision, 15, pp. 123-141, June 1995.

[9] P. Martins and J. Batista, Monocular Head Pose Estimation, Paper presented at International Conference on Image Analysis and Recognition, 2008.

[10] K. Baker, Singular Value Decomposition Tutorial, The Ohio State University, 2005.

[11] R. Gonzales and R. Woods, Digital Image Processing, second edition. Prentice Hall, 2002.

[12] Haralick, Analysis and Solutions of the Three Point Perspective Pose Estimation Problem, Proc. IEEE Conf. Computer Vision and Pattern Recognition, Maui, Hawaii, 1991.

[13] Paul Viola, Michael J. Jones, Robust Real-Time Face Detection, Proc. International Journal of Computer Vision 57 (2), Kluwer Academic Publishers. Manufactured in The Netherlands, 2004, p.137–154. [14] Ed Angel, OpenGL Transformations: Interactive Computer Graphics

4E, Addison-Wesley, 2005.

[15] Lars Kruger, Model Based Object Classification and Localisation in Multiocular Images, Dissertation, November 2007.

[16] F. Abdat, C. Maaoui, and A. Pruski, Real Facial Feature Points Tracking With Pyramidal Lucas-Kanade Algorithm, IEEE RO-MAN08, The 17th International Symposium on Robot and Human Interactive Communication, Germany, 2008.

(7)