• Tidak ada hasil yang ditemukan

an be used for both images and videos with stati environments. Methods whih use stati

frameworks an be further lassied into three groups based on the modelling of skin olour

distribution,suhasexpliitboundaryspeiation,parametrimodellingand non-parametri

modelling. On the other hand, dynami framework-based methodsare used for videos having

dynami environments, suh as varying illumination and dynami bakground onditions. A

more detaileddisussion on existing skin detetionmethodsis given asbelow.

1.4.1 Skin detetion methods using stati framework

Majority of existing skin detetion methods use a stati framework. This implies that

harateristis of bakground and illumination do not vary with time. Hene, these type of

skindetetionmethodsare intended forboth imagesand videosforstati environments. These

methodseitheruseaglobalskindetetionmodeloraloalskindetetionmodel. Kakumanuet

al. [71℄ and Kawulok et al. [8℄ provided omprehensive surveys of dierent approahes of skin

Boundary speiation forskin olourdepends ona set of thresholds and onditions whih

ould beeither dened in thesame olour spae,(e.g. RGB) orina transformed olourspae,

suh asYCbCr, HSV, CIELab et.

One of the earliestmethodsofskin detetionis proposed by Sobottkaand Pitas [43℄. They

proposed a skin detetion boundary along

S

and

H

hannels in HSV olour spae as

S ∈ [0.23, 0.68]

and

H ∈ [0, 50]

. Later, Tsekeridou and Pitas proposed a modiation [2℄ to this

methodforfaeregionsegmentationinanimagewatermarkingsystem[72℄. Theorresponding

boundary rule in the HSV olour spae isas follows:

(0 ≤ H 6 25) ∨ (335 6 H 6 360) (0 6 S 6 0.6) ∧ (0.4 6 V )

(1.21)

Figure 1.6-a shows the projetion of the above rules onto the RGB olour spae. Here, darker

shade shadeimplieshigher density ofskinpixels. Solina etal.[3℄proposed anotherset of xed

Figure 1.6: Skin olour distributions in dierent olour planes: (a) Tsekeridou and Pitas [2℄, (b)

Solina etal.[3℄, ()Hsuetal. [4℄,(d)Kukharev andNowosielski [5℄,(e) A.Cheddad etal.[6℄,and (f)

Y.-H.Chen etal.[7℄. (Note: thegure istaken from [8℄)

rules in RGB spae for fae detetion of people havingfair omplexionas:

 

 

 

 

(R > 95) ∧ (G > 40) ∧ (B > 20) max(R, G, B) − min(R, G, B) > 15

|R − G| 6 15 ∧ (R > G) ∧ (R > B)

inuniform daylight illumination (1.22)

or,

(R > 220) ∧ (G > 210) ∧ (B > 170)

|R − G| 6 15 ∧ (R > G) ∧ (R > B)

in ashlight lateralillumination (1.23)

For unknown lightingonditions, a pixelis lassiedas skin if itsatises one of the above two

onditions. These rules are illustrated inFigure 1.6-b in R-G, R-B, G-B and r-g planes. Hsu

et al. [4℄proposed a boundary rule based on YCbCr olour spae. The authors observed that

the shape of skin tone luster in Cb-Cr spae an be approximated as an elliptial struture

wherethe lusterloationdepends onluminane

Y

. They performed anon-linearmodiation

to

C b

and

C r

values if

Y < 125

or

Y > 188

. Subsequently, the skin pixel luster is modelled as an ellipse in a transformed spae

Cb Cr

. The equivalent results for skin distribution in RGB spae is shown in Figure 1.6-. Kukharev and Nowosielski proposed another set of skin

detetion rules [5℄ using RGB and YCbCr olour spaes asfollows:

 

 

 

 

(R > G) ∧ (R > B)

{(G > B) ∧ (5R − 12G + 7B > 0)} ∨ {(G < B) ∧ (5R + 7G − 12B > 0)}

{Cr ∈ (135, 180)} ∧ {Cb ∈ (85, 135)} ∧ (Y > 80)

(1.24)

TheorrespondingmodelrepresentationinRGBandrgspaeisgiveninFigure1.6-d. Cheddad

et al. [6℄transformed the normalized RGB olour spae intoa single-dimensional error signal,

where the skin olour distribution an be modelled as a Gaussian urve [6℄. Subsequently,

a pixel is lassied as a skin if its 1D equivalent value lies within the two threshold values

determined by the standard deviation of the urve. The skin model is shown in Figure 1.6-e.

Reently,Chenetal. proposedanewRGBsubspae forskindetetionbysubtratingtheRGB

values:

sR = R − G

,

sG = G − B

,

sB = R − B

. Subsequently, they proposed aboundary rule

{(−142 < sR < 18) ∧ (−48 < sG < 92) ∧ (−32 < sB < 192)}

. The rules are illustrated in Figure 1.6-f. In [73℄, Shaik et al. ompared HSVand YCbCr spaes for skindetetion using a

Table 1.1: Mostpopular examplesof olour spaes usedin skindetetion: RGB, YCbCr, HSV and

advanedspaeER/GH [18℄

Colour

spae

Range of omponents Restritions for skin olour

RGB R,G,B: [0,255℄

R > 95 ∧ G > 40 ∧ B > 20 ∧

{max (R, G, B) − min (R, G, B) > 15} ∧ |R − G| >

15 ∧ R > G ∧ R > B

YCbCr Y, Cb,Cr: [0, 255℄

Y > 80 ∧ 77 < Cb < 127 ∧ 133 < Cr < 173

HSV H: [

0

,

360

℄,S,V: [0,1℄

0 < H < 50 ∧ 0.1 < S < 0.68 ∧ 0.35 < V < 1

ER/GH R,G: [0, 255℄, H:[

0

,

360

13.4224 < E ∧ R/G < 1.7602 ∧ H < 23.89

boundary-based method. In 2015,Sawiki and Miziolek[18℄ proposed anotherset of boundary

rules in CMYK spae asfollows:

Before ROC analysis:

(K < 205) ∧ (0 6 C 6 0.05) ∧ (0.089 < Y < 1) ∧ (0 6 C/Y < 1) ∧ (0.1 6 Y /M < 4.8)

(1.25)

After ROC analysis:

(K < 205) ∧ (0 6 C 6 0.05) ∧ (0.0909 < Y < 0.945) ∧ (0.1 6 Y /M < 4.67)

(1.26)

Apart from these simple boundary speiations in dierent olour spaes, advaned ap-

proahes are also proposed for a more aurate 3D desription of skin luster. For example,

Garia and Tziritas [74℄ proposed a skin detetion method by utilizing a set of planes in the

YCbCr spae. Brandand Mason [50℄ performed a omparativeanalysis of algorithmsin three

olour spaes: RGB, YES and YIQ. In their analysis, parametri thresholds and statistial

funtions are used. Thresholding of the

R/G

ratios is also performed. In [51℄, a new olour

spae

ER/GH

is proposed by mixingof olour omponents. In the pseudospae

ER/GH

,

E

belongs to YES,the

R/G

ratio is fromRGB spae, and

H

isfrom HSV.

Some of the authors used additional information like texture features to improve skin de-

tetion. For example, Wang et al. [75℄ used gray-level o-ourrene matrix (GLCM) for skin

detetion. In this method,a whitebalaning is performed in YCbCr olourspae to minimize

the eet of unontrolled illuminationonditions. Firstly, the

Y

omponents are arranged in

desending order. The minimum valueof thetop 5% values ofthe

Y

omponentistermedasa

parameter

E

,and remainingvaluesinthetop 5%are setto255. Similarly,themaximumvalue

among the bottom 5% values of the

Y

omponentis termed asa parameter

B

, and remaining

values inthebottom5% are setto0. Finally,the intermediate

Y

omponentsare re-alulated as:

g(x, y) = 255 × ln f(x, y) − ln B

ln E − ln B

(1.27)

where,

g(x, y)

isthewhitebalanedluminanevalueatloation

(x, y)

,and

f (x, y)

istheoriginal

luminane valuebeforewhite balaning. The skinolourmodelisdened by asetof boundary

rulesinRGBspae. TheauthorsalsofoundthatskindistributioninYCgCbspaetakesirular

shape. Finally, a skin mask is obtained by ANDing two skin models derived from RGB and

YCbCr spaes. Detetionperformane isfurther improved by inorporating atexture analysis

into this skin model. Textural features are extrated using the GLCM. For a given gray-sale

image

I

of size

n × m

, the GLCM is given by:

T (i, j ) =

n

X

x=1 m

X

y=1

1, ifI(x, y) = i ∧ I(x + ∆ x , y + ∆ y ) = j 0, otherwise,

(1.28)

where,

(∆ x , ∆ y )

isthe oset between the pixels

I(x, y)

and

I (x + ∆ x , y + ∆ y )

. The omputa-

tional omplexity in determiningthe GLCM depends on the number of grey levels

g

,and itis

proportionalto

O(g 2 )

However, reently published literatures show that the performane of expliit boundary

speiation-based methods are not better than the model-based approahes [8℄.

skinolourdistribution. Inthiswork,statistialtestsareprovidedtoshowasetheadvantageof

usingGMMoverSGMforskinolourdistributionmodelling. Greenspanetal.[77℄showed that

GMM-basedrepresentation ofskin pixeldistributionismore robust toenvironmentalhanges,

suhasolourspaehanges,highlightsandshadows. TheyalsousedtwoGaussianomponents

forGMM,and onerepresentsthedistributionofskinolourundernormallight,whiletheother

representsthe distributionof the morehighlightedregionsof the skin. Caetano etal.[78℄ used

twotoeightGaussianomponentsforpixeldistributionmodellingin

rg

olourspaeforpeople

having dierent skin tones. Lee and Yoo [55℄ proposed anelliptial modelling-basedapproah

for skin detetion. The elliptial modelling is less omputationally omplex than the GMM

modelling. However, many trueskinpixelsmay berejetedif theellipse issmall. On the other

hand, if the ellipse is suiently large, many non-skin pixels may be deteted as skin pixels.

TheyusedsixGaussianomponentstoimplementtheGMM.Ontheotherhand,Thuetal.[79℄

used four Gaussian omponents. Use of multipleGaussian enables detetion of dierent parts

ofafaewhihare illuminateddierently. Jones andRehg [13℄used twoseparate GMMs,eah

having 16 Gaussian omponents for skin and non-skin pixel distribution. A skin probability

map (SPM) for animage is derived fromthe two models using Bayes theorem. The SPM is a

2D array ofsize equaltothe image. An elementof theSPMrepresents aposterioriprobability

of a pixelbeing skin atthat loation.

The performane of these simple parametri models is limited due to two major fators

a) apparent hange in skin appearane due to unontrolled illuminationonditions, and b)

the presene of skin-like olours in image bakground. To overome these problems, dierent

authors proposed dierent improvements over simple parametri models for skin detetion.

Phung et al. [38℄ proposed an adaptive sheme to selet the optimum threshold for the SPM

by assuming that a skin region to be oherent and homogeneous in texture. Segmentation

auray of skin regions an be further improved by inorporating texture analysis in the

parametri modelling framework. Texture features an be extrated by performing texture

analysis in various domains, suh as graysale [75,80℄, olour [81℄, or skin map [82℄. In order

toextrattexture features,dierentauthorsuseddierentfeature desriptors. Jiang etal.[83℄

proposed a new approah by inorporating texture and spae analysis in a standard SPM

framework. An initialskin mask is derived for an image by thresholding the SPMwith a low

threshold. Subsequently, textural features are extrated using Gabor wavelets. This gives a

Figure 1.7: A owhart showing (a)trainingand (b)detetion proessesproposedbyKawulok [9℄

textural map for the image. The texture map is thresholded based on an assumption that

bakground regionsare oarser than skinregions. This givesa texture maskor atexture lter

whih is later ombinedwith the initialskin mask to obtaina more aurate skin mask. This

redues false aeptane error signiantly. Finally, the watershed segmentation is employed

with a set of well-dened region markers to grow skin regions to redue false rejetion error.

H.-M. Sun [84℄ proposed a loal adapation sheme for the Bayesian lassier as proposed by

Jones and Rehg [13℄. They generatedaloalskinmodelfromaset of skinpixelssamplesfrom

the image. Finally,the loalmodelisombinedwith theglobal ortrainedmodelinaweighted

sumapproah. P.NgandC.M.Pun ombined2-DDaubehies wavelets-basedtextureanalysis

with a GMM-based olour model [85℄. The 2-D Daubehies wavelets are alulated by using

the sub-images whihare entered ateahof the pixelloations. Texture feature ateah pixel

loationisrepresentedbythewavelet energyvetor

v e

,whihisobtainedbyapplyingShannon

entropy on the wavelet oeients vetor

v c

. The

v e

for all the pixel loations are nally

grouped intoaset of lustersusing k-Means lusteringalgorithm. Finally,some ofthe lusters

are marked as non-skin basedon their Shanonentropies and eliminated aordingly. Kawulok

et al. [9℄ used linear disriminative analysis (LDA) to derive disriminative features between

skin and non-skin regions. In this method, LDA projetion matrix is derived by using olour

andloaltexturefeaturesfromasetoflabelledimages. TheLDAprojetionmatrixdependson

trainingdata. Therefore, LDAgivesaprojetionmatrix whihensures best possibleinter-lass

disrimination.

Another approahfollowsanuse of spatialanalysisof skinregionsbyexploitingthe spatial

alignment of skin pixels and their relation with neighbourhood pixels [14,80,86,87℄. These

approahessigniantlyreduefalsepositivesindetetingtheskinregions. Ingeneral,allthese

spatialanalysis-basedmethodsarebased onastandard SPM. Ruiz-del-SolarandVershae [86℄

proposed a skin detetion method whih uses a ontrolled diusion. The ontrolled diusion

proess has two steps: a) extration of diusion seeds, and b) atual diusion proess. The

diusion seeds are extrated by thresholding the SPM with a high threshold. In the diusion

step, skinregions are grown fromthe seeds by inludingthe neighborhoodpixelswhih satisfy

a given diusion riteria. The riteria depends on two fators a) dierene between soure

and a test pixel in diusion domain, and b) SPM value at the test pixel loation. Therefore,

this method works well if skin regions have sharp boundaries. A leak in diusion may our

if there are smooth transitions between pixels from one region to another. In 2010, Kawulok

proposed an energy-based sheme for skin blob analysis [87℄. Pixels with high valued SPM

values are seleted asskin seeds. These seed regionsare subjeted to morphologialerosionto

further redue false aeptane. In this method, seed pixels are assumed to have a maximal

energy, whih is likely to be spread over an image. The amount of energy transferred to an

adjaentpixelfromasourepixeldependsontheskinprobabilityoftheadjaentpixel. Apixel

is exluded fromskin region if there is noenergy leftto be passed ontoit from a sourepixel.

In 2013,M. Kawulok [14℄ proposed apropagation-basedregiongrowing method,whihutilises

spatial relationship between the pixels. Kawulok's method is based on Dijkstra's minimum

path-ost algorithm [88℄. In Kawulok's method, eah pixel is onsidered as an independent

node and the imageis the orresponding graph. In this method,the optimum values of region

growing parameters are seleted manually.

Thereareanotherlassofapproahesofskinsegmentation,whihusesomepriorinformation

about the atual skin olour of a person present in an image. In general, human skin olour

does not show signiant variations over the body. So, a fae detetor an be used to detet

the fae and extrat a set of pixels beloning to faial region. The prior information obtained

from the faial pixels is then utilized to segment out other skin regions of the human body.

A global skin detetion model an be loally adapted aording to the distribution of faial

skin pixels. Fritsh et al. [89℄ used fae detetion to derive a loal skin model for skin region

traking. In 2008, Kawulok [19℄ proposed adynami skin modelby using pixel harateristis

of faial regions. The global pixel statistis are fused with the loal statsitis of faial skin

pixels. Yogarajahet al.[20℄ used a dynami thresholding-basedmethodfor skindetetion. In

Figure 1.8: Proposed framework by Tan et al. [10℄: eye detetor, 2-D histogram, Gaussian model,

and fusionstrategy.

this method, a dynami threshold is obtained from the harateristis of skin pixelsextrated

fromthefaialregions. Tanet al.[10℄proposedafusion-basedskindeteion methodusingfae

detetion. Forthis,asmoothedolourhistogramandaGaussianmodelofskinisfusedtogether.

Kawulok et al. [21℄ showed that the seletion of seeds points using faial pixels an improve

the detetionaurayinaregiongrowing-basedskindetetion method. Pixels extratedfrom

the fae provide a good estimate of olour distribution of skin regions even in the presene of

skin-likebakgrounds and/orpoorilluminationonditions.

Cortes and Vapni showed that Support vetor mahines (SVMs) an alsobeused for skin

detetion. Ingeneral,thenumbertrainingsamplesforskinand non-skinpixelsusually beome

too large to handle by the SVM. Han et al. [25℄ proposed a skin segmentation method using

ative learning based SVM lassier and region information. The SVM ative learning is a

well-known approah to deal with large trainingdataset [94℄. In this method, itassumed that

the region information is robust to illumination variations and noise. Eah image is divided

into a number of regions. A region is seleted as a skin if it satises the following riterion,

whih is expressed as:

NS(R i )

NT (R i ) > η

(1.29)

where,

NS(R i )

,

NT (R i )

are the number of skin pixels and the total number of pixels in the

region

R i

, respetively;

η

isa pre-dened onstant.

A more popular non-parametriapproah isthe use of bak propagation ANNs (BPANNs)

for skin detetion [11,95℄. For example, Chen et al. [58℄ proposed a skin detetion algorithm

by using BPANN with geneti optimization. In their work, pixel omponents in RGB spae

are transformed into the normalized RGB spae. The

r

and

g

omponents of pixels then fed

intoaBPANN made of2input neurons,4 hiddenneurons intwohidden layers, andanoutput

neuron. Eah of these neuron's response is haraterised by a logisti sigmoid funtion given

by:

f(x) = 1

1 + e σx

(1.30)

where,

σ

is the steepness of the sigmoid urve,

x

is the weighted sum of the inputs, and

f(x)

is the output. The stability and onvergene of the ANN depends on the parameter

σ

. So,

they used a geneti algorithm (GA) to optimize the seletion of the parameter

σ

. Finally, if

the olouromponents

r, g, b

innormalizedRGB spaesatisfy

r > g

or

r > b

,then the

r

and

g

omponents are fed into the BPANN lassier. Seow et al. [11℄ proposed a skin olour model

for fae detetion, whih aims at reduing the eet of skin olour variations among dierent

people. Theyuseda3-layeredBPANNwiththe

r, g, b

omponentsasinputsasshown inFigure

1.9. A set of 410 skin samples(eah ontaining a

10 × 10

path) is olleted fromskin regions

belongingtodierentraes. Sinethesampleset annotrepresenttheentire skinpopulation,a

Multi-LayerPereptron(MLP)ANNistrainedbyusingaBakPropagation(BP)algorithmfor

interpolationofsampleset. Finally,a

256 × 256 × 256

olourubeisgeneratedtoobtainallthe

possible olour ombinationsand they are fed into the MLP toextrat the skinregions. Yang

Figure 1.9: Skin detetionusing ANNproposedby Seow etal. [11℄

et al. used ANN along with anadaptive skinmodellingtodetet skinregions more aurately

inanimageasshowninFigure1.10.Inthismethod,theluminaneomponent

Y

ofthe YCbCr

olourspaeisusedforreduingtheeetsofilluminationvariations. Atrst,the

Y

omponent

is arranged in desending order, and divided into multiple equidistant intervals. After that,

pixels belonging to same luminane interval are seleted, and the orresponding mean and

ovariane matrix of the seleted pixels in

Cb, Cr

spae are alulated. The luminane mean

of eahinterval,the ovarianeand mean in

Cb, Cr

spaeare used totrain athreelayer ANN.

Finally,the outputof the ANN isfed toa Gaussian lassierfor skin lassiation.

Image Database

Colour space conversion RGB YCbCr

Divide the total range of

Y into finite number of intervals N

Statistic the mean and variance of each interval

BP neural network

Adaptive skin model

Skin colour classification

Test image

Figure 1.10: ProposedframeworkbyYang etal. [12℄.

skinregions. Therefore, underunonstrained illuminationand bakgroundonditions, theskin

pixels annot be loated perfetly. Also, it requires a set of labelled initial frames to train

a Support Vetor Mahine (SVM) lassier, and the initial positions of skin-oloured objets

need to be determined. Hanet al. [101℄ proposed a skin segmentation and traking algorithm

for sign language reognition by using Support Vetor Mahine (SVM) ative learning. The

training of SVM is done by a set of initial frames. The ative learning of SVM makes the

algorithmomputationally less expensive. However, the major drawbak of this methodis its

inability to handle varying illuminationonditions. The SVM needs to be re-learned at every

frame tohandle the varying illuminationonditions. Trainingof anSVM repeatedly for every

frame is a omputationally very expensive proess. Liu et al. [102℄ proposed a dynami skin

detetion algorithmforvideos. In this method, afaedetetion-based modelupdatesheme is

proposed for varying illuminationonditions. However, only globalilluminationvariations are

onsidered, and the method is not suitable for loal illuminationhanges whih mostly our

due to movingbody parts.

1.5 Researh Motivation

From the brief literature survey presented in this report, it is evident that a signiant

amountofworkisneeded foreientlydetetskinregionsindierentenvironmentalonditions

inimages. Additionally,detetingskinregionsinvideos inthe preseneof varyingillumination

onditions is another important task. To ombine these requirements in one algorithm is a

majorhallenge. Aordingly,thisthesislooksintoseveralaspetsusinghromatiandtextural

informationsof skinregions and aims atdeveloping suitablealgorithmsthat takeare of some

limitationsoftheexistingmethods. Themotivationsbehindthisresearhworkaregivenbelow:

(i) Typially a hromatiand/or textural disriminationis observed between skinand non-

skin regions of an image. Kawulok et al. [9℄ used a linear disriminant analysis (LDA)-

based most disriminative feature extration approah for skin detetion. However, ex-

tratedfeaturestotallydependsontrainingdata. Asnaturalimagesareingeneralunor-

related in a sense that spatial distribution of texel and olours is quite random. Hene,

the disriminative features extrated by the LDA may not be most disriminative fea-

tures for an unknown image. Hene, an image spei disriminative feature extration