an be used for both images and videos with stati environments. Methods whih use stati
frameworks an be further lassied into three groups based on the modelling of skin olour
distribution,suhasexpliitboundaryspeiation,parametrimodellingand non-parametri
modelling. On the other hand, dynami framework-based methodsare used for videos having
dynami environments, suh as varying illumination and dynami bakground onditions. A
more detaileddisussion on existing skin detetionmethodsis given asbelow.
1.4.1 Skin detetion methods using stati framework
Majority of existing skin detetion methods use a stati framework. This implies that
harateristis of bakground and illumination do not vary with time. Hene, these type of
skindetetionmethodsare intended forboth imagesand videosforstati environments. These
methodseitheruseaglobalskindetetionmodeloraloalskindetetionmodel. Kakumanuet
al. [71℄ and Kawulok et al. [8℄ provided omprehensive surveys of dierent approahes of skin
Boundary speiation forskin olourdepends ona set of thresholds and onditions whih
ould beeither dened in thesame olour spae,(e.g. RGB) orina transformed olourspae,
suh asYCbCr, HSV, CIELab et.
One of the earliestmethodsofskin detetionis proposed by Sobottkaand Pitas [43℄. They
proposed a skin detetion boundary along
S
andH
hannels in HSV olour spae asS ∈ [0.23, 0.68]
andH ∈ [0, 50]
. Later, Tsekeridou and Pitas proposed a modiation [2℄ to thismethodforfaeregionsegmentationinanimagewatermarkingsystem[72℄. Theorresponding
boundary rule in the HSV olour spae isas follows:
(0 ≤ H 6 25) ∨ (335 6 H 6 360) (0 6 S 6 0.6) ∧ (0.4 6 V )
(1.21)
Figure 1.6-a shows the projetion of the above rules onto the RGB olour spae. Here, darker
shade shadeimplieshigher density ofskinpixels. Solina etal.[3℄proposed anotherset of xed
Figure 1.6: Skin olour distributions in dierent olour planes: (a) Tsekeridou and Pitas [2℄, (b)
Solina etal.[3℄, ()Hsuetal. [4℄,(d)Kukharev andNowosielski [5℄,(e) A.Cheddad etal.[6℄,and (f)
Y.-H.Chen etal.[7℄. (Note: thegure istaken from [8℄)
rules in RGB spae for fae detetion of people havingfair omplexionas:
(R > 95) ∧ (G > 40) ∧ (B > 20) max(R, G, B) − min(R, G, B) > 15
|R − G| 6 15 ∧ (R > G) ∧ (R > B)
inuniform daylight illumination (1.22)
or,
(R > 220) ∧ (G > 210) ∧ (B > 170)
|R − G| 6 15 ∧ (R > G) ∧ (R > B)
in ashlight lateralillumination (1.23)
For unknown lightingonditions, a pixelis lassiedas skin if itsatises one of the above two
onditions. These rules are illustrated inFigure 1.6-b in R-G, R-B, G-B and r-g planes. Hsu
et al. [4℄proposed a boundary rule based on YCbCr olour spae. The authors observed that
the shape of skin tone luster in Cb-Cr spae an be approximated as an elliptial struture
wherethe lusterloationdepends onluminane
Y
. They performed anon-linearmodiationto
C b
andC r
values ifY < 125
orY > 188
. Subsequently, the skin pixel luster is modelled as an ellipse in a transformed spaeCb ′ Cr ′
. The equivalent results for skin distribution in RGB spae is shown in Figure 1.6-. Kukharev and Nowosielski proposed another set of skindetetion rules [5℄ using RGB and YCbCr olour spaes asfollows:
(R > G) ∧ (R > B)
{(G > B) ∧ (5R − 12G + 7B > 0)} ∨ {(G < B) ∧ (5R + 7G − 12B > 0)}
{Cr ∈ (135, 180)} ∧ {Cb ∈ (85, 135)} ∧ (Y > 80)
(1.24)
TheorrespondingmodelrepresentationinRGBandrgspaeisgiveninFigure1.6-d. Cheddad
et al. [6℄transformed the normalized RGB olour spae intoa single-dimensional error signal,
where the skin olour distribution an be modelled as a Gaussian urve [6℄. Subsequently,
a pixel is lassied as a skin if its 1D equivalent value lies within the two threshold values
determined by the standard deviation of the urve. The skin model is shown in Figure 1.6-e.
Reently,Chenetal. proposedanewRGBsubspae forskindetetionbysubtratingtheRGB
values:
sR = R − G
,sG = G − B
,sB = R − B
. Subsequently, they proposed aboundary rule{(−142 < sR < 18) ∧ (−48 < sG < 92) ∧ (−32 < sB < 192)}
. The rules are illustrated in Figure 1.6-f. In [73℄, Shaik et al. ompared HSVand YCbCr spaes for skindetetion using aTable 1.1: Mostpopular examplesof olour spaes usedin skindetetion: RGB, YCbCr, HSV and
advanedspaeER/GH [18℄
Colour
spae
Range of omponents Restritions for skin olour
RGB R,G,B: [0,255℄
R > 95 ∧ G > 40 ∧ B > 20 ∧
{max (R, G, B) − min (R, G, B) > 15} ∧ |R − G| >
15 ∧ R > G ∧ R > B
YCbCr Y, Cb,Cr: [0, 255℄
Y > 80 ∧ 77 < Cb < 127 ∧ 133 < Cr < 173
HSV H: [
0 ◦
,360 ◦
℄,S,V: [0,1℄0 ◦ < H < 50 ◦ ∧ 0.1 < S < 0.68 ∧ 0.35 < V < 1
ER/GH R,G: [0, 255℄, H:[
0 ◦
,360 ◦
℄13.4224 < E ∧ R/G < 1.7602 ∧ H < 23.89
boundary-based method. In 2015,Sawiki and Miziolek[18℄ proposed anotherset of boundary
rules in CMYK spae asfollows:
•
Before ROC analysis:(K < 205) ∧ (0 6 C 6 0.05) ∧ (0.089 < Y < 1) ∧ (0 6 C/Y < 1) ∧ (0.1 6 Y /M < 4.8)
(1.25)
•
After ROC analysis:(K < 205) ∧ (0 6 C 6 0.05) ∧ (0.0909 < Y < 0.945) ∧ (0.1 6 Y /M < 4.67)
(1.26)Apart from these simple boundary speiations in dierent olour spaes, advaned ap-
proahes are also proposed for a more aurate 3D desription of skin luster. For example,
Garia and Tziritas [74℄ proposed a skin detetion method by utilizing a set of planes in the
YCbCr spae. Brandand Mason [50℄ performed a omparativeanalysis of algorithmsin three
olour spaes: RGB, YES and YIQ. In their analysis, parametri thresholds and statistial
funtions are used. Thresholding of the
R/G
ratios is also performed. In [51℄, a new olourspae
ER/GH
is proposed by mixingof olour omponents. In the pseudospaeER/GH
,E
belongs to YES,the
R/G
ratio is fromRGB spae, andH
isfrom HSV.Some of the authors used additional information like texture features to improve skin de-
tetion. For example, Wang et al. [75℄ used gray-level o-ourrene matrix (GLCM) for skin
detetion. In this method,a whitebalaning is performed in YCbCr olourspae to minimize
the eet of unontrolled illuminationonditions. Firstly, the
Y
omponents are arranged indesending order. The minimum valueof thetop 5% values ofthe
Y
omponentistermedasaparameter
E
,and remainingvaluesinthetop 5%are setto255. Similarly,themaximumvalueamong the bottom 5% values of the
Y
omponentis termed asa parameterB
, and remainingvalues inthebottom5% are setto0. Finally,the intermediate
Y
omponentsare re-alulated as:g(x, y) = 255 × ln f(x, y) − ln B
ln E − ln B
(1.27)where,
g(x, y)
isthewhitebalanedluminanevalueatloation(x, y)
,andf (x, y)
istheoriginalluminane valuebeforewhite balaning. The skinolourmodelisdened by asetof boundary
rulesinRGBspae. TheauthorsalsofoundthatskindistributioninYCgCbspaetakesirular
shape. Finally, a skin mask is obtained by ANDing two skin models derived from RGB and
YCbCr spaes. Detetionperformane isfurther improved by inorporating atexture analysis
into this skin model. Textural features are extrated using the GLCM. For a given gray-sale
image
I
of sizen × m
, the GLCM is given by:T (i, j ) =
n
X
x=1 m
X
y=1
1, ifI(x, y) = i ∧ I(x + ∆ x , y + ∆ y ) = j 0, otherwise,
(1.28)
where,
(∆ x , ∆ y )
isthe oset between the pixelsI(x, y)
andI (x + ∆ x , y + ∆ y )
. The omputa-tional omplexity in determiningthe GLCM depends on the number of grey levels
g
,and itisproportionalto
O(g 2 )
However, reently published literatures show that the performane of expliit boundary
speiation-based methods are not better than the model-based approahes [8℄.
skinolourdistribution. Inthiswork,statistialtestsareprovidedtoshowasetheadvantageof
usingGMMoverSGMforskinolourdistributionmodelling. Greenspanetal.[77℄showed that
GMM-basedrepresentation ofskin pixeldistributionismore robust toenvironmentalhanges,
suhasolourspaehanges,highlightsandshadows. TheyalsousedtwoGaussianomponents
forGMM,and onerepresentsthedistributionofskinolourundernormallight,whiletheother
representsthe distributionof the morehighlightedregionsof the skin. Caetano etal.[78℄ used
twotoeightGaussianomponentsforpixeldistributionmodellingin
rg
olourspaeforpeoplehaving dierent skin tones. Lee and Yoo [55℄ proposed anelliptial modelling-basedapproah
for skin detetion. The elliptial modelling is less omputationally omplex than the GMM
modelling. However, many trueskinpixelsmay berejetedif theellipse issmall. On the other
hand, if the ellipse is suiently large, many non-skin pixels may be deteted as skin pixels.
TheyusedsixGaussianomponentstoimplementtheGMM.Ontheotherhand,Thuetal.[79℄
used four Gaussian omponents. Use of multipleGaussian enables detetion of dierent parts
ofafaewhihare illuminateddierently. Jones andRehg [13℄used twoseparate GMMs,eah
having 16 Gaussian omponents for skin and non-skin pixel distribution. A skin probability
map (SPM) for animage is derived fromthe two models using Bayes theorem. The SPM is a
2D array ofsize equaltothe image. An elementof theSPMrepresents aposterioriprobability
of a pixelbeing skin atthat loation.
The performane of these simple parametri models is limited due to two major fators
a) apparent hange in skin appearane due to unontrolled illuminationonditions, and b)
the presene of skin-like olours in image bakground. To overome these problems, dierent
authors proposed dierent improvements over simple parametri models for skin detetion.
Phung et al. [38℄ proposed an adaptive sheme to selet the optimum threshold for the SPM
by assuming that a skin region to be oherent and homogeneous in texture. Segmentation
auray of skin regions an be further improved by inorporating texture analysis in the
parametri modelling framework. Texture features an be extrated by performing texture
analysis in various domains, suh as graysale [75,80℄, olour [81℄, or skin map [82℄. In order
toextrattexture features,dierentauthorsuseddierentfeature desriptors. Jiang etal.[83℄
proposed a new approah by inorporating texture and spae analysis in a standard SPM
framework. An initialskin mask is derived for an image by thresholding the SPMwith a low
threshold. Subsequently, textural features are extrated using Gabor wavelets. This gives a
Figure 1.7: A owhart showing (a)trainingand (b)detetion proessesproposedbyKawulok [9℄
textural map for the image. The texture map is thresholded based on an assumption that
bakground regionsare oarser than skinregions. This givesa texture maskor atexture lter
whih is later ombinedwith the initialskin mask to obtaina more aurate skin mask. This
redues false aeptane error signiantly. Finally, the watershed segmentation is employed
with a set of well-dened region markers to grow skin regions to redue false rejetion error.
H.-M. Sun [84℄ proposed a loal adapation sheme for the Bayesian lassier as proposed by
Jones and Rehg [13℄. They generatedaloalskinmodelfromaset of skinpixelssamplesfrom
the image. Finally,the loalmodelisombinedwith theglobal ortrainedmodelinaweighted
sumapproah. P.NgandC.M.Pun ombined2-DDaubehies wavelets-basedtextureanalysis
with a GMM-based olour model [85℄. The 2-D Daubehies wavelets are alulated by using
the sub-images whihare entered ateahof the pixelloations. Texture feature ateah pixel
loationisrepresentedbythewavelet energyvetor
v e
,whihisobtainedbyapplyingShannonentropy on the wavelet oeients vetor
v c
. Thev e
for all the pixel loations are nallygrouped intoaset of lustersusing k-Means lusteringalgorithm. Finally,some ofthe lusters
are marked as non-skin basedon their Shanonentropies and eliminated aordingly. Kawulok
et al. [9℄ used linear disriminative analysis (LDA) to derive disriminative features between
skin and non-skin regions. In this method, LDA projetion matrix is derived by using olour
andloaltexturefeaturesfromasetoflabelledimages. TheLDAprojetionmatrixdependson
trainingdata. Therefore, LDAgivesaprojetionmatrix whihensures best possibleinter-lass
disrimination.
Another approahfollowsanuse of spatialanalysisof skinregionsbyexploitingthe spatial
alignment of skin pixels and their relation with neighbourhood pixels [14,80,86,87℄. These
approahessigniantlyreduefalsepositivesindetetingtheskinregions. Ingeneral,allthese
spatialanalysis-basedmethodsarebased onastandard SPM. Ruiz-del-SolarandVershae [86℄
proposed a skin detetion method whih uses a ontrolled diusion. The ontrolled diusion
proess has two steps: a) extration of diusion seeds, and b) atual diusion proess. The
diusion seeds are extrated by thresholding the SPM with a high threshold. In the diusion
step, skinregions are grown fromthe seeds by inludingthe neighborhoodpixelswhih satisfy
a given diusion riteria. The riteria depends on two fators a) dierene between soure
and a test pixel in diusion domain, and b) SPM value at the test pixel loation. Therefore,
this method works well if skin regions have sharp boundaries. A leak in diusion may our
if there are smooth transitions between pixels from one region to another. In 2010, Kawulok
proposed an energy-based sheme for skin blob analysis [87℄. Pixels with high valued SPM
values are seleted asskin seeds. These seed regionsare subjeted to morphologialerosionto
further redue false aeptane. In this method, seed pixels are assumed to have a maximal
energy, whih is likely to be spread over an image. The amount of energy transferred to an
adjaentpixelfromasourepixeldependsontheskinprobabilityoftheadjaentpixel. Apixel
is exluded fromskin region if there is noenergy leftto be passed ontoit from a sourepixel.
In 2013,M. Kawulok [14℄ proposed apropagation-basedregiongrowing method,whihutilises
spatial relationship between the pixels. Kawulok's method is based on Dijkstra's minimum
path-ost algorithm [88℄. In Kawulok's method, eah pixel is onsidered as an independent
node and the imageis the orresponding graph. In this method,the optimum values of region
growing parameters are seleted manually.
Thereareanotherlassofapproahesofskinsegmentation,whihusesomepriorinformation
about the atual skin olour of a person present in an image. In general, human skin olour
does not show signiant variations over the body. So, a fae detetor an be used to detet
the fae and extrat a set of pixels beloning to faial region. The prior information obtained
from the faial pixels is then utilized to segment out other skin regions of the human body.
A global skin detetion model an be loally adapted aording to the distribution of faial
skin pixels. Fritsh et al. [89℄ used fae detetion to derive a loal skin model for skin region
traking. In 2008, Kawulok [19℄ proposed adynami skin modelby using pixel harateristis
of faial regions. The global pixel statistis are fused with the loal statsitis of faial skin
pixels. Yogarajahet al.[20℄ used a dynami thresholding-basedmethodfor skindetetion. In
Figure 1.8: Proposed framework by Tan et al. [10℄: eye detetor, 2-D histogram, Gaussian model,
and fusionstrategy.
this method, a dynami threshold is obtained from the harateristis of skin pixelsextrated
fromthefaialregions. Tanet al.[10℄proposedafusion-basedskindeteion methodusingfae
detetion. Forthis,asmoothedolourhistogramandaGaussianmodelofskinisfusedtogether.
Kawulok et al. [21℄ showed that the seletion of seeds points using faial pixels an improve
the detetionaurayinaregiongrowing-basedskindetetion method. Pixels extratedfrom
the fae provide a good estimate of olour distribution of skin regions even in the presene of
skin-likebakgrounds and/orpoorilluminationonditions.
Cortes and Vapni showed that Support vetor mahines (SVMs) an alsobeused for skin
detetion. Ingeneral,thenumbertrainingsamplesforskinand non-skinpixelsusually beome
too large to handle by the SVM. Han et al. [25℄ proposed a skin segmentation method using
ative learning based SVM lassier and region information. The SVM ative learning is a
well-known approah to deal with large trainingdataset [94℄. In this method, itassumed that
the region information is robust to illumination variations and noise. Eah image is divided
into a number of regions. A region is seleted as a skin if it satises the following riterion,
whih is expressed as:
NS(R i )
NT (R i ) > η
(1.29)where,
NS(R i )
,NT (R i )
are the number of skin pixels and the total number of pixels in theregion
R i
, respetively;η
isa pre-dened onstant.A more popular non-parametriapproah isthe use of bak propagation ANNs (BPANNs)
for skin detetion [11,95℄. For example, Chen et al. [58℄ proposed a skin detetion algorithm
by using BPANN with geneti optimization. In their work, pixel omponents in RGB spae
are transformed into the normalized RGB spae. The
r
andg
omponents of pixels then fedintoaBPANN made of2input neurons,4 hiddenneurons intwohidden layers, andanoutput
neuron. Eah of these neuron's response is haraterised by a logisti sigmoid funtion given
by:
f(x) = 1
1 + e σx
(1.30)where,
σ
is the steepness of the sigmoid urve,x
is the weighted sum of the inputs, andf(x)
is the output. The stability and onvergene of the ANN depends on the parameter
σ
. So,they used a geneti algorithm (GA) to optimize the seletion of the parameter
σ
. Finally, ifthe olouromponents
r, g, b
innormalizedRGB spaesatisfyr > g
orr > b
,then ther
andg
omponents are fed into the BPANN lassier. Seow et al. [11℄ proposed a skin olour model
for fae detetion, whih aims at reduing the eet of skin olour variations among dierent
people. Theyuseda3-layeredBPANNwiththe
r, g, b
omponentsasinputsasshown inFigure1.9. A set of 410 skin samples(eah ontaining a
10 × 10
path) is olleted fromskin regionsbelongingtodierentraes. Sinethesampleset annotrepresenttheentire skinpopulation,a
Multi-LayerPereptron(MLP)ANNistrainedbyusingaBakPropagation(BP)algorithmfor
interpolationofsampleset. Finally,a
256 × 256 × 256
olourubeisgeneratedtoobtainallthepossible olour ombinationsand they are fed into the MLP toextrat the skinregions. Yang
Figure 1.9: Skin detetionusing ANNproposedby Seow etal. [11℄
et al. used ANN along with anadaptive skinmodellingtodetet skinregions more aurately
inanimageasshowninFigure1.10.Inthismethod,theluminaneomponent
Y
ofthe YCbCrolourspaeisusedforreduingtheeetsofilluminationvariations. Atrst,the
Y
omponentis arranged in desending order, and divided into multiple equidistant intervals. After that,
pixels belonging to same luminane interval are seleted, and the orresponding mean and
ovariane matrix of the seleted pixels in
Cb, Cr
spae are alulated. The luminane meanof eahinterval,the ovarianeand mean in
Cb, Cr
spaeare used totrain athreelayer ANN.Finally,the outputof the ANN isfed toa Gaussian lassierfor skin lassiation.
Image Database
Colour space conversion RGB YCbCr
Divide the total range of
Y into finite number of intervals N
Statistic the mean and variance of each interval
BP neural network
Adaptive skin model
Skin colour classification
Test image
Figure 1.10: ProposedframeworkbyYang etal. [12℄.
skinregions. Therefore, underunonstrained illuminationand bakgroundonditions, theskin
pixels annot be loated perfetly. Also, it requires a set of labelled initial frames to train
a Support Vetor Mahine (SVM) lassier, and the initial positions of skin-oloured objets
need to be determined. Hanet al. [101℄ proposed a skin segmentation and traking algorithm
for sign language reognition by using Support Vetor Mahine (SVM) ative learning. The
training of SVM is done by a set of initial frames. The ative learning of SVM makes the
algorithmomputationally less expensive. However, the major drawbak of this methodis its
inability to handle varying illuminationonditions. The SVM needs to be re-learned at every
frame tohandle the varying illuminationonditions. Trainingof anSVM repeatedly for every
frame is a omputationally very expensive proess. Liu et al. [102℄ proposed a dynami skin
detetion algorithmforvideos. In this method, afaedetetion-based modelupdatesheme is
proposed for varying illuminationonditions. However, only globalilluminationvariations are
onsidered, and the method is not suitable for loal illuminationhanges whih mostly our
due to movingbody parts.
1.5 Researh Motivation
From the brief literature survey presented in this report, it is evident that a signiant
amountofworkisneeded foreientlydetetskinregionsindierentenvironmentalonditions
inimages. Additionally,detetingskinregionsinvideos inthe preseneof varyingillumination
onditions is another important task. To ombine these requirements in one algorithm is a
majorhallenge. Aordingly,thisthesislooksintoseveralaspetsusinghromatiandtextural
informationsof skinregions and aims atdeveloping suitablealgorithmsthat takeare of some
limitationsoftheexistingmethods. Themotivationsbehindthisresearhworkaregivenbelow:
(i) Typially a hromatiand/or textural disriminationis observed between skinand non-
skin regions of an image. Kawulok et al. [9℄ used a linear disriminant analysis (LDA)-
based most disriminative feature extration approah for skin detetion. However, ex-
tratedfeaturestotallydependsontrainingdata. Asnaturalimagesareingeneralunor-
related in a sense that spatial distribution of texel and olours is quite random. Hene,
the disriminative features extrated by the LDA may not be most disriminative fea-
tures for an unknown image. Hene, an image spei disriminative feature extration