Augmented Reality 3D Eyeglasses
Frame Simulator Using Active Shape Model
and Real Time Face Tracking
Endang Setyati, Yosi Kristian, Yuliana Melita Pranoto Department of Information Technology Sekolah Tinggi Teknik Surabaya (STTS)
Surabaya, Indonesia.
[email protected], [email protected], [email protected]
David Alexandre
Department of Industrial Management National Taiwan University of Science and Technology
Taipei City, Taiwan
Abstract—Combination of real-world object and computer-generated object know as Augmented Reality (AR) is considered media of the future. In AR the computer-generated object is a result of a three-dimensional graphics rendering. An AR software designed to provide real-time interactivity with the user in forms of video. In this paper we implement AR technology to simulate eyeglasses frame model simulation so user can try many eyeglasses frame models and see how they look without actually have to try the eyeglasses. After activating this software, users in the webcam’s range will be processed automatically, a 3D eyeglasses frame model will be positioned in the user’s face. To develop this software we must first implement a fast and effective face tracking algorithm so user can feel the AR as a real object. The speed of face detection or face tracking used in this software will have a significant effect on user experience. On the other hand the higher the video resolution the slower the face tracking algorithm will be. In this paper we use a medium video resolution (640 x 480) so we can increase the speed of face tracking algorithm to make the frame per second user experience acceptable. This simulation is an example of an effective AR technology usage.
Keywords—Technology Innovation, Augmented Reality, Face Tracking, Eyeglasses Frame Simulator.
I. INTRODUCTION
Face tracking (real time face detection) algorithm is one of the fast-growing subjects in the computer vision. Numerous algorithm and methods have been applied to accomplish a fast and robust face tracking system. A good face tracking system is able to track human faces on video stream with acceptable frame rates and to detect only human face, not a human alike face.
An AR system supplements the real world with virtual (computer-generated) objects that appear to coexist in the same space as the real world [1]. In AR implementation generally, the media used is a video camera and a marker. By determining the position detected form the marker, AR system can position the computer generated object to look like a real object in real life.
One of a significant breakthrough is in two dimensional face processing. The result of this research is a possibility to process a two dimensional (2D) face data into a three
II. FACE TRACKING ON VIDEO
A. Augmented Reality
In concept, AR is a real world condition which modified by computer into a new condition. Input used in AR varies from sound, video, image, to GPS. In the larger scale, AR need various sensors and special environment. The way of AR working is involving an input which is a position or coordinates in real environment and output which is a generated object from AR calculation. The AR generated object is a result from a 3D object.
Generally AR can be divided into two groups, marker-less tracking and marker-based tracking. Marker-based detection on augmented reality will involves marker as a basis for defining input. Therefore, one of the algorithms used in this simulation will be a marker tracking algorithm for finding a marker type and position for generating AR object with accurate position. The result is a picture from 3D object which its position adjusted based on the marker position. Of course, the user is required to have the marker mentioned above for activating AR object.
In marker-less tracking, AR defines its input by using algorithms such as object tracking, hand tracking, or face tracking. The result for these algorithms will be used to define the position of AR generated object. In its development, various methods and algorithms have been used for generating more accurate and better AR. In addition, there are also a significant enhancement in the supporting devices, such as glasses for watching the AR result directly, etc.
B. Haar-Like Feature
Haar-like features are digital image features used in object recognition. This method works by changing a part of image (Region of Interest) into a value. This method is one of the common methods for face detection with high speed and accuracy. They owe their name to their intuitive similarity with Haar wavelets and commonly used in the real-time face detector.
B next step wi h pixel area. T h area for an o
lementation fo
F
Active Shape M
A landmark re he images und right eye pu ting landmark
A set of landm vectors of poi
larity transfo tion) that m ween shape p
ned training dmarked faces)
The ASM star pe aligned to t a global face s until conver
(i) Suggest hape points b und each point
) Conform t The individ shape mod matchers to
Active Shape del is iterativel he image [6]. T uired from a n
Given a guess matched with ameter b for P del can be defi
object. Instanc defining positi
X
re: Xc(Xc,Y
ree and scale n image’s fram
ar-like featur mage into pix ll be to calcu The resulting object detectio or face detecti
Figure 1. Haar Lik
Model
epresents a dis der observatio upil [2]. We ks.
marks forms a ints. We align form (allowin minimizes the
oints. The m shapes (whic ).
rts the search the position a detector. It t gence
a tentative sh by template m t
the tentative s dual template del pools the o form a stron
Model (ASM ly changed to This method umber of train
ed position in h the image.
Point Distribu fined in a coor
ce X in the im
with value of me model.
res working xel area based ulate the inten difference is u n. Example o ion can be see
ke Implementatio
stinguishable p on, for examp
can locate f
a shape. Shap n one shape ng translatio e average eu mean shape is
ch in our ca
h for landmark and size of th then repeats t
hape by adjus matching of
shape to a glo e matches are e results of th nger overall cl
M) [5] is a m o match that m
is using a flex ning data sam
n a picture, AS By choosing ution Model, rdinate frame mage’s model c
n, and scale.
used to categ f Haar-like fe en on Fig.1.
on.
point exist in ple, the locatio
facial feature
pes are represe to another w on, scaling,
uclidean dist the mean o ase are manu
ks from the m he face determ
the following
sting the loca the image tex
obal shape m unreliable an he weak tem lassifier.
method where model into a m xible model w mple.
SM iteratively g a set of s the shape o which center can be constru
is a rotation w s a center pos
Locating a bet the image; Updating par according the Deciding cons shape (ex: |bi| Repeat the st (convergent m between an ite
In practice, th nt for a better
the best matc y to get better ch have the h ng the profile. del’s point. Ev ng edges w istical model p
To get better h Gaussian Py zed and it is s ulate the ASM h different siz
he best result.
It should be c ays located i
se points can ge structure. T e found in figu
Pose from Ort
Pose from O SIT [7] is an a ition of particu m Dementhon
ect, at least fo required. The tion matrix a
rix R from
rdinates from mes from an ob
tive Shape M
tter position f
ameters (Xb, new position straints on pa
< 3√λi); teps until con means there eration and the
he iteration wil position and ch to these ne r position is t highest intens The location ven though, th which combin
point.
result, the Ac yramid. With
stored as temp M result from ze and then im
considered tha n highest int
be representin The best appr ure 2.
Figure 2. ASM
tography and S
Orthography a algorithm whi ular object in
on year 1995 our non-coplan erefore, a 3D
found will be ng perspectiv ction from sca otation matrix
e purpose of and translation
an object i three vectors bject coordina
Model works w
for points in ar
Yb, s, θ, b) found in step arameter b to
nvergent cond are no sign e iteration bef
ll look in the i update the m ew found posi to acquire the sity (with orie of this edge is he best locatio
ned by the
ctive Shape M subsampling, porary data. T m one image mage with the
at a point in tensity edge ng lower inten roach is by an
Implementation.
Scaling with I
and Scaling ich can be use
3D field. This [8]. For meas nar points in
projection fie e used for this ve projection
alated object. x and transla
f this algorith n vector of a
s a matrix s i, j, and k. T ate system whi
with steps suc
round the poin
shape and p 1;
ensure a matc
dition is achi nificant differ fore).
image around model paramete
ition. The sim e location of entation if kn s a new locati on should be o references
Model is comb the image w The next step
to another im e best location
the model do in local struc nsity edge or nalyzing what
Iteration
with Iteratio ed for estimat s algorithm or suring a pose the image’s o eld from 2D p algorithm’s i based on o The result o ation vector o
a ca rix is to calcu rdinate system Mi, i between f
te a vector pr rdinate. For e g M0Mi coord rdinate system
re iu, iv, iw ar u, M0v, M0w)
The rotation m and j in the ted from a slation vector and object r slation vector
Image 2 sh ogonal projec nt Mi which ha
The steps for P Calculating an
-multiply 3x( calculate I -calculate i an
i Recalculate th -Calculate k v -Calculate z vector with Z -Calculate:
nate system. ulate a camera
m such as M first row of th rojection to i v example, Xi –
re i coordinat . ction which a ave a referenc
POSIT algorit n object featu
1 0 0) , (i
atively i, j, an ’ image vecto
1
(N-1) B matrix
and J vec
= B.X’ dan J scale from pr r normalization vector from th coordinate fr Z0 = f / s wher a coordinate sy
M0Mi the mu he matrix and
vector which – X0 coordina vector row loc gives result su
constructed f dinate system.
thm follow: ure points wh
1
), n N ...
nd Z0:
or which N-1
1
vector with
N-1
0
y ,(i ...N )
x with N-1 co ctors which
= B.Y’ rojection as a n. he cross produ from Z0 whic ystem from o ultiplication
M0Mi vector, located in ca ate from M0M
cated in the s ch as these:
coordinate sy
from a calcul . k vector wi j. While for n projection p e coordinates
1 coordinate
1
1
), n
-1 coordinate
1
1
), n N
oordinate vect consist of t
an average fro
2 ra’s focal leng
uld located in object projecti orithm.
Figu
Head Pose Est
The estimatio lication will b ive Shape Mo d for face dete ning the user’ d for acquiring face feature del.
ure 4. Points of Fa
In an augme mbination betw real world tured using we generated us
)
n1 is greate t step 3. han the thresh s value. tion vector is
trix with i, j, a
considered t the beginning on is shown i
ure 3. The result f
timation
on of user’s h be using these
odel, and PO ection, Active ’s face point, g a user’s face points which
ace Feature Used
III. RESULT A
ented reality a ween a real wo
condition wh ebcam. While sing compute
er than the thr
hold then disp
s OM0 = Om
and k vector r
that the objec g of coordinat
n the shape of
from POSIT algo
head pose [9 e algorithms: H
SIT. Haar-lik e Shape Mode and POSIT e. Poses will b h generated
for Projection in
AND TESTING
application, the orld condition
here the user e the virtual w er graphics.
reshold then n
play the Pose
m0/s and rot
ows.
ct reference te axis. On fig f cube with PO
orithm.
] when using Haar-like feat ke features wi el will be use algorithm wi be calculated b by Active S
n POSIT Algorithm
e user will wa n and virtual w r located wi world condition
happen in combining both of this world will involving several algorithms mentioned before.
In this application the user will have to positioned in front of the webcam and activate it. After that, the display of user’s face will be automatically displayed with virtual eyeglasses frame. With the help of webcam, this simulator software will be able to operate real time just like a realistic mirror for user. There are several things that acquired after the testing is conducted.
A. Face Detection and ASM Accuracy
In the next testing, it will be seen that face tracking using Active Shape Model which used in the software will have significant speed change based on input image resolution.
By several testing with different input image resolution and same proporsional resolution, it is concluded that low resolution image will increase the speed of the software and the number of frame per second for the output. Table 1 shows the fps with different resolution.
TABLE I. FRAME PER SECOND COMPARISON
Resolution Frame Per Second (FPS)
320 x 240 15 – 18
480 x 360 12 – 15
640 x 480 9 – 11
800 x 600 6 – 8
The number of iteration for Active Shape Model process to get the best point location not depends on the image size itself. From several testing, it is known that the size from the input image does not have a significant impact with the number of iteration. It is because of the effect of subsampling that used in the face tracking process.
For the example, the number of iteration from image with 640x480 resolution is not too differ than image with 480x360 resolution. Table 2 shows the effect of resolution to the numbers of iteration.
TABLE II. NUMBER OF ITERATION COMPARISON.
Resolution Number of iteration
320 x 240 10-16
480 x 360 12-15
640 x 480 11-17
800 x 600 14-20
The accuracy of Active Shape Model depends on several factors, such as brightness, image sharpness, and noise. For brightness, it is known that the image brightness intensity will affect the accuracy of detection. The comparison between brightness intensity and active shape model result will be shown in figure 5.
Through testing, it is found out that the best brightness for this face detection is an average one. A high brightness or low brightness will decrease the accuracy of the face detection.
Figure 5. Face Detection in Several Types of Brightness.
The next testing will be involved with object in image. Often, there are several objects in the image that can block the face. With Active Shape Model algorithm then the face detection will have increased accuracy even if parts of the face is blocked by object. It should be remembered that the maximum face point feature that blocked by objects is ¼. The result of the face detection can be seen in figure 6.
Figure 6. Face Detection in Partly Blocked Face.
B. Head Pose Estimation Accuracy
The testing for the user’s head pose is using POSIT algorithm which resulted in rotation matrix and translation vector as the output. The result from POSIT algorithm is acceptable even though it is not precise since the feature face points detection with Active Shape Model is not constant. Therefore, the output for the user’s head pose is not precise.
By testing, the simulator software developed is acceptable in the terms of speed and accuracy. Eyeglasses frame provided in the simulator is constructed using OpenGL and located in the correct place according the face detection result. The pose shown before is created by masking methods. The result of the application can be shown in figure 8.
Figure 7. User’s Head Pose Estimation
Figure 8. The result of eyeglasses frame simulator software.
IV. CONCLUSION
By analyze the implementation of AR, several things can be concluded:
1. Augmented Reality application should be supported by correct algorithm since it is commonly requires real-time processing.
2. Active shape model algorithm have a particular error level because shape model factor used and algorithm’s efficiency.
3. The choice of correct input image in the implementation of active shape model will have a significant impact on the output. Filtering and the environment where the image is taken are important factors for achieving best results in the active shape model implementation.
4. The quality of POSIT algorithm’s output will have a high dependency on the quality of its input image.
5. The user movement speed have great impact on frame positioning accuracy because of the ASM and POSIT processing delay.
6. For high resolution video ASM and POSIT will need more time to process and cause some delay.
ACKNOWLEDGMENT
This work was supported and fully funded by Directorate General of Higher Education, Ministry of Education and Culture of Indonesia.
REFERENCES
[1] Ronald Azuma, Recent Advances in Augmented Reality, IEEE Transaction on Computer Graphics and Application, 1997.
[2] Stephen Milborrow and Fred Nicolls, Locating Facial Features with an Extended Active Shape Model, Computer Vision ECCV, Springer Berlin Heidelberg, 2008.
[3] Paul Viola, Michael J. Jones, Rapid Object Detection using Boosted Cascade of Simple Features, IEEE CVPR, 2001.
[4] T. F. Cootes and C.J. Taylor. Statistical Models of Appearance for Computer Vision. University of Manchester, 2004.
[5] T. F. Cootes, C.J. Taylor, D.H. Cooper, J. Graham. Active Shape Models –Their Training and Application, 1995.
[6] T. F. Cootes, Active Shape Models – Smart Snakes, British Machine Vision Conference, 1992.
[7] Marco Treiber, An Introduction to Object Recognition Selected Algorithms for a Wide Variety of Applications, Springer London Dordrecht Heidelberg New York, ISSN 1617-7916 ISBN 978-1-84996-234-6 e-ISBN 978-1-84996-235-3 DOI 10.1007/978-1-84996-235-3, 2010.
[8] D. Dementhon and Larry S. D., Model-Based Object Pose in 25 Lines of Code, International Journal of Computer Vision, 15, pp. 123-141, June 1995.
[9] P. Martins and J. Batista, Monocular Head Pose Estimation, Paper presented at International Conference on Image Analysis and Recognition, 2008.
[10] K. Baker, Singular Value Decomposition Tutorial, The Ohio State University, 2005.
[11] R. Gonzales and R. Woods, Digital Image Processing, second edition. Prentice Hall, 2002.
[12] Haralick, Analysis and Solutions of the Three Point Perspective Pose Estimation Problem, Proc. IEEE Conf. Computer Vision and Pattern Recognition, Maui, Hawaii, 1991.
[13] Paul Viola, Michael J. Jones, Robust Real-Time Face Detection, Proc. International Journal of Computer Vision 57 (2), Kluwer Academic Publishers. Manufactured in The Netherlands, 2004, p.137–154. [14] Ed Angel, OpenGL Transformations: Interactive Computer Graphics
4E, Addison-Wesley, 2005.
[15] Lars Kruger, Model Based Object Classification and Localisation in Multiocular Images, Dissertation, November 2007.
[16] F. Abdat, C. Maaoui, and A. Pruski, Real Facial Feature Points Tracking With Pyramidal Lucas-Kanade Algorithm, IEEE RO-MAN08, The 17th International Symposium on Robot and Human Interactive Communication, Germany, 2008.
happen in combining both of this world will involving several algorithms mentioned before.
In this application the user will have to positioned in front of the webcam and activate it. After that, the display of user’s face will be automatically displayed with virtual eyeglasses frame. With the help of webcam, this simulator software will be able to operate real time just like a realistic mirror for user. There are several things that acquired after the testing is conducted.
A. Face Detection and ASM Accuracy
In the next testing, it will be seen that face tracking using Active Shape Model which used in the software will have significant speed change based on input image resolution.
By several testing with different input image resolution and same proporsional resolution, it is concluded that low resolution image will increase the speed of the software and the number of frame per second for the output. Table 1 shows the fps with different resolution.
TABLE I. FRAME PER SECOND COMPARISON
Resolution Frame Per Second (FPS)
320 x 240 15 – 18
480 x 360 12 – 15
640 x 480 9 – 11
800 x 600 6 – 8
The number of iteration for Active Shape Model process to get the best point location not depends on the image size itself. From several testing, it is known that the size from the input image does not have a significant impact with the number of iteration. It is because of the effect of subsampling that used in the face tracking process.
For the example, the number of iteration from image with 640x480 resolution is not too differ than image with 480x360 resolution. Table 2 shows the effect of resolution to the numbers of iteration.
TABLE II. NUMBER OF ITERATION COMPARISON.
Resolution Number of iteration
320 x 240 10-16
480 x 360 12-15
640 x 480 11-17
800 x 600 14-20
The accuracy of Active Shape Model depends on several factors, such as brightness, image sharpness, and noise. For brightness, it is known that the image brightness intensity will affect the accuracy of detection. The comparison between brightness intensity and active shape model result will be shown in figure 5.
Through testing, it is found out that the best brightness for this face detection is an average one. A high brightness or low brightness will decrease the accuracy of the face detection.
Figure 5. Face Detection in Several Types of Brightness.
The next testing will be involved with object in image. Often, there are several objects in the image that can block the face. With Active Shape Model algorithm then the face detection will have increased accuracy even if parts of the face is blocked by object. It should be remembered that the maximum face point feature that blocked by objects is ¼. The result of the face detection can be seen in figure 6.
Figure 6. Face Detection in Partly Blocked Face.
B. Head Pose Estimation Accuracy
The testing for the user’s head pose is using POSIT algorithm which resulted in rotation matrix and translation vector as the output. The result from POSIT algorithm is acceptable even though it is not precise since the feature face points detection with Active Shape Model is not constant. Therefore, the output for the user’s head pose is not precise.