Journal of Thooretical and Applied Information Technology 20"' October 20l3 Yo! 56 No.2
C 2005 - 2013 JATIT & LLS. All lights reseNed
lS:>N. QYYRMャャmセ@ WWW. Hllll.org I -ISSN: 1817-J l?S
INF ANT
CRIES IDENTIFICATION
BY USING CODEBOOK AS FEATURE MATCHING,
AND MFCC AS FEATURE EXTRACTION
1i\1E011ANITA OE\\ I RENA '-'Tl , 1
AGUS BUONO, 1WISNU ANANTA KUSUMA
1
Diploma Program of Bogor Agricultural Universi ty
2Compukr Science Department of Bogor Agricultural University
E-mnil: 1mcJha nuho a "ahoo.com. セオ、・セィ。@
ii
vahoo.co.id , lananta ci'iph.ac.idABSTRACT
In this paper, \\C focused on automation of Dunstan Bab) Language. This system uses MFCC as feature
e\lraetion and codebook as ti!oture matching. The codebook or clusters is made from the proceeds of all the buhy's cries du1a, by using the k-mcans clustering. The datn is taken from Dunstan Baby Language videos 1hat has been processed. The d:ita is divided into two, 1raining data and testing data. There are 140 training dato, each of \\hich represents the 28 hungry infant cries. 28 sleepy infant cries, 28 \\anted to burp infant cries, 28 in pain infant cric,. and 28 uncomfortable infant cries (could be because his diaper is wet/too hot/cold air or an} thing else}. lhe ャ」セオョァ@ data is 35, respectively 7 infant cries for each type of infant cry. The research '.tr) i ng frame length: 25 ms/frame length
=
275. 40 ms/frame length = 440. 60 ms/ frame length = 660. O\ erlap frame: 0°o. RUセッN@ 40%, the number of codewords: I to 18, except for frame length275 and O\.crlap frame = O u-,ing I to 29 clusters. The identification of this type of infant cries uses the minimum distance of euclidean distance. Accuraq \aluc is between
37%
and94%.
Sound 'ch' is the most familiar. whereas sound ·m,11· is always missunderstood and generally it is known as 'neh' and 'eairh' . The weakness point of this research is the silent is onl) be cut at the beginning and at the end of speech signal. Hopefully. in the nc\t research. the silent cnn be cul in the middle of sound so that it can produce more specific sound. It has impucl on the bigger accurac) as \\ell.Keywords: Codebook. D1111.11t111 baby la11g11age, Infant cries. K-means cf11sreri11g, MFCC
I. I NTRODUCTION
The first verbal communication which is mastered b) a bab> is cr:ang. Currently, there is a system that learns the meaning of a 0-3 month old infanl cries \\hich is called dオョセエ。ョ@ !lab) Language (DDL). DBL is introduced b) Priscilla Dunstan, nn
Auslralinn mu!>ician "ho has gol 1nlent to
remember all 1..inJs of sounds. kM\\O as sound photograph .. '\ccording to DBI 'cr-sion, there are lhc bab) languages: .. nch" means hunger, "owh .. means tired \\hich indicates lhat the baby is getting sleepy. "eh'. means that the baby \\ants to burp, "eairh" means pain (\\ind) in the stomach. and "heh" means uneomfonablc (could ll<! due to a \\<Cl diaper, too hot or cold air, or an.> thing else).
The expenise to determine the meaning or infant cries in DBL version is still a bit sparse so the information of bab)
·s
Lr) meaning is not readil) a\·ailable to the parenti.. Current!}. a S)Stem to transl';.:r knowledge nbout DBL is b) attending a training or seminar, or b) ウエオ、セ@ ing their own infant cries meaning (in DBL version) \\hich is nlread)'packaged in the fonn of optical discs. The materials of DBL can also be downloaded on the internet. DBL system users, particularly in Indonesia, will be
mun: connc.Jcm '"ith the セオョ」ャオZゥゥッョ@ they make if
there is a software that can automatically generate the meaning of their infant cries. It can strengthen their conclusions. In addition, this software will olso be useful for parents who do not attend any DBL 1taining or seminar, so parenls can understand the language or the crymg of their baby.
Research on infant cries has been done by researchers, such as: cries classification of normal nod abnormal (hypoxia-oxygen lacl..s) infant by using a neural neh\Orl.. which produces 85%
accuracy
f
I], the classificntion of healthy infants and infants who experienced pain like brain damage, lip cleft palate, hydrocephalus, and sudden infant death syndrome by using Hidden Markov Model(HMM)
which produces91%
accuracy[2].
Journal of Theoretic al and Applied Information Technology 20111 October 2013
Yo!.
56 No.2c
2005 - 2013 JA TIT & LLS All r1ghts reserved·ISSN QYYRMXV Tセ@
オセゥョ・@
a
neural ncl\\ori. '' hicl1 produces 86%accurac)
13 ].
The classilication of' infont cri..:s <;tudics have prc,iously used a neur.il ュNセャ|LオイャNN@ or I l\.1M as the classifier. The research to ゥ、」ョエゥエセ@ the DBL infant cries used codebook as a clussilicr or pattern identifier \\hich is obtained from the k-means clustcring and MFCC as feature .:'\traction. The choice of the method is based on the high accuracy results from the researches, セオ」ィ@ as: automatic recognition of bird:.ongs using セャャ ᄋ cc@ and Vector Quantization (VQ) codebook and produces 85%
accurac} [ .t ]. Besides, the research about speaker n.:cognition S)'Stem also ウオ」」」セヲGオャャI@ created b) using MFCC dan VQ {5). The same n.:search that Singh and Rajan (6) did, got 98.57°0 uccuracy by doing speaker recognition イ・セ。イ」ィ@ using MrCC and VQ. The research about spc.:ch n.:cogni1ion and verification using MFCC and VQ "hich Patel and Prasad (2013) did can do n:cognition '' ith training error rat..: about I 3°'o (7). I he mal..ing of this codehook u!ling k-mcans clustering.
This research aims to perform thc modeling of codebook me1hod \\ilh J..-mcans clustering technique, and Mel fイ」アオ・ュ Z セ@ Cepstrum Coellicient!> HセQfcHGI@ to identi I)' 1hi: infant cries. I he scope of this research arc: the infant cries classification used is the ver:;ion ol' the Dunstan Dah) Language \\hich is di' ided into groups of hung!') bab}, tired 'sleep} bab). like burping bab) , infnnts e'\pcrience pain in the stomach, and uncomfortable bab} This '>nllmm.: is used 10
identit) the meaning of0-3 month old infant cries.
2. M AT EIUA L S AN D l\1 ETllOOS
The ュ・エィッ、ッャッァセ@ of thi' rc!>Carch con<;i<;ls of SC\cral StJges of proccs:.: data collection. preprocessing. codebook modeling of infant cries. testing and analysis. and interface manufacturing. The methodology of the inf.mt cries meaning
itlemific::uion process is sho'' n in rigurc I .
2.1 Data Collection
The data used for this セエオ、|@ ts the infant cries that arc grouped into li\c ,;·1es of hungr)' bab). sleepy baby, like burping 「。「セN@ in pain bab) (in the stomach), and the uncomfort.ihl1.: bab). The data is taken from Dunstan Bab) Lmgu:igc 'ideos that has been processed.
The data is di"ided into t\\O, training data and testing data. There arc 140 training data, each of \\hich reprcsems the 28 hungr} infant cries, 28 sleep) infant cries, 28 "anted to burp infant cries, 28 in pain infant cries, and 28 uncomfortable infant cries (could be because ィゥセ@ di:ipcr is wet/too
E-ISSN: 181H t95
hot/cold air or ョョケエィゥョセ@ else}. The testing data is 35, rcspccti,el> 7 infant cries for each type of infant cry.
Stan
Data collection
Preprocessing Silent cuuung
Feniure extmction· MFCC
Infant cries
Preprocessing' Silent cununa
Fcatun: cxttact1on. MFCC
lntcrfocc milking
finish
Figure I. 711e methodology of iden1ifyi11g the meaning of a crying baby
2.2 Preprocessing
Silent cutting is in the preprocessing stage and
the feature extraction uses MFCC method. The MFCC flow diagram can be seen in Figure 2.
2.3 Infant C rin Codebook M od eling
The codebook that is going to be made is the codebook of ench infant cry data. The codebool.. of clusters is m:ide from the proceeds of all the baby's cries data, by using I.he k-means clustering. Codebook is :i set of points (vectors) that represent the distribution of a particular sound from a speaker in a sound chamber. Each point of the codebook is known as a code\\ord. The recognition process m..:ans that in e"ery sound, the distance of the sound is always counted to each speaker codebook. 'I he distance of incoming sound signals to 11
speaker codebook is calculated as the sum of the distances of C<ich frame \\hich is read 10 the nearest codeword. Finally, input signal is labeled
as
[image:2.628.43.549.72.702.2]Journal of Theorotical and Applied Information Technology 20"' Oc!obe! 2013. Vol 56 No.2
Q 2005 - 2013 JATIT & LLS. All rights reserved·
ISSN 1'192-lllM!t ----='=''=''=W=.j=m=it=o=rs::._ _ _ _ _ _ __ _ _ _ _ =E·..:.l::.SS::N:...::....:: QZNNZXNNZNNQWNNZNNNjNNZNNi ⦅ セ NNNZNN@
compilation result i!> 」。ャャセ、@ t.:n<lebook. I his codebook a' a speaker model
fKI.
speech signal
i
...
y
I· ramc hH0 O., 01 ... , O, 01
..
\\ llldO\\jng
I', (11)
"'x,
(n)1r( 11), () · II \ ' -I{ ' 7 1
LNセLLIMPNUjMPTV」\^@ - - - }
,\'-1 o,.,,s ,,·-1
+
rFT
Xn =
L:;J
X"e _m!!..
11 =0, 1, 2, .. ... ,N - 1
•
Md frcquenC} \\rappm;:: md(I)"' 2595 · 1011,. (I + セI@
M filter is obtained.then calculmc lhc i\ lcl spectrum
(N-1
)
x,
=tou10
l:
IX(k)I · 11,<k) •k•O
I= 1, 2, .... ,M
111 (k) is triangle filter' aluc to-1
•
Ccp-.trum Cocfic1ents· [)1,m:tc Cosme
Trnn,form
c
1 = ; .. セB@ 1x
,cos -(''1-1 ")
2- ; ; 1 1,2,3,. .. J,J-number of c:ocffic1c111s, M number of filter
Figure 2· Diagram Alur .lfF('C
K-means clustering i!; a \\ell-known panitioning method. o「ェ・」エセ@ arc classified as belon&ing to one of k groups. k chosen a priori. Cluster membership is de1ermme<l b) calculating the centroid for each group (the muhi<limcnsional \ersion of the mean) and assigning each object 10 the group with closest centroid. !"his approach minimi£es the overall \\ithin-clustcr dispersion b) iterative reallocation of cluster mem bers.
The pseudo code of the k-mca11' algorithm is to explain hO\\ it \\Orks [91 :
..I Ch<>ose Kus the 11umher of .1m1 rs
8. lnitialb! the codebook
vea
ws of the A. clusters Hイ。ョ、ッュセセᄋN@ for instance)C. For every• new sampll! veclor.
C. I Compl/fe 1he 、ャウイオョ」セ@ belliee11 the new vector and every cluster 's codebook vector.
C 2 Recompute the closes/ codebook vector with the new \'ector. using a learning rate that decreases in time
The stages of codebook making for a speaker are as follows:
I At each pronunciation (n pronunciation
as
training data), feature extraction is performed b} using MFCC technique on each frame with a certain length and overlap.
2 All frames of n pronunciation are combined into one set and unsupervised clustering is done by using k-mc::ins clustering technique, choosing a number of clusters according to a number of the
desired codeword.
2.4 Ttstlng and Analysis
S1agc of testing means a test to identify the meaning of infant cries. The !low process of identilicatiotVrecognition phase are:
I Each new utterance which is go into the system is read frame by frame, (e.g. the number of frame:. obtained is T), and feature extraction using a MFCC technique is done.
2 Calculate the speech input distance signal to each speaker's codebook in the system.
3 Decision: assign a label on the sound input in accordance with the speak.er \\ith the smallest codebook distance.
Speech input distance \\ith codebook is formulah:<l
as
follows:I For each frame of the incoming speech inpu1, the di:.tancc to every codeword is calculotcd and codeword \\ith the minimum distance is selected.
2 The distance between the input speech and the codebook is a number of minjmum distance (equation I ). discance(input, codebook)
=
:Ef.1vco11 .. キッGZMGjセ{、Hヲイ。ュ・LL@ codeword1c)J
( I)
Distance used in this research is Euclidean distance. Euclidean distance between i object is defined by equation 2 [10].
(2)
Journal of Theoretical and A p plied Information Tech n o log y RPセ@ October 2013. Vol. 56 No.2
tl:l 2005 . 2013 JATlT & LLS. All rights reserved·
ISSN: 1992-36-45 \\W\UU!i\.urg £-ISSN: t817-.J 195
• Frame length. 25 ュセOャイッュ」@ kneth - 275.
40 ms/frame kngth - 440.
60 ms/ frame length セ@ 660. • O\erlop frame : 0°o. 25%, 40%.
• the number of co<lewords/numb..:r or clusters: to 18. except for frame length 275 and overlap ""' 0 using I to 29 clusters
The accuration vnlue of each combination of factors and levels "ill be calculated by using equation 3.
accuracy
a numbL'T of ci:scrng data with n1ht 1dcnt1/1catlon
00
= xi
a number of testrnq darn
(3)
2.5 Interface Making
The interface mal..ing uf th1. infant cries identificotion is made bused on the training dntn that produces the ィゥァィ」セャ@ accuracy.
3. RESULTS AND DI SCUSS IONS
[image:4.619.28.539.65.759.2]The making of thi c; research ゥセ@ using Matlab R20 IOb 'crsion 7 .1 1.0.584 soil\\ arc. Result of accurac} comparison using cud idcan distance "ith testing data is sho\\ n in Figur..: 3. According to Figure 3 the higher :iccuroc) "hen the frame length
=
440. 01·erlap frame 0 .4. anJ k - 18 \\ith 94°·0accurac)'. The same ;u.:curaC) can be got |セィ・ョ@
frame length - 660, orerfap frame 0.25, and k
-14.
The Grapb or Accuracy V1lue Comptimon
I 2 J 4 S 6 7 t 9 10 11 12 13 I• 1$ 16 17 II
nulnMr or chuters (k) - frame length 27S overlap frame 0 - frame length 27S. overlap frame 0 2S
-rmme lcnath 275, overlap frame. 0.4 - frame lcnglh 440. ッセ・イャ。ー@ frame: 0 - frame length 440, ッセ」イャョー@ frame: 0.25 - frame length 440, overlap frame 0.4 -rmme length 660. O\erlap frame 0 -frame length 660. O\trlap frame 0 2S - fnune len 660, O\Crla frame 04
F1gurtJ Tht graph of acc11racy mlue comparison usmg
testing dolt/
/able Z: Tes1111g 11.smg Testing Data when k-18,
Frame length -660, and overlap
=
0,2SData testing to- Type of
I 2 3 4
s
6 7 criesResult of tcs1ing using the data testing \\hen k
=
18. frame length .... 440 and oH::rlap frame .. 0.4can be seen in Table I. Resull of testing using the data testing \\hen k 18. ヲイ。ュセ@ length = 660 and O\.erlap fr.imc 0.25 i:. shown ml Jblc 2. Result of
testing using the dat:i testing ''hen k
=
18, frame length=
275 and O\l:rlap frame .. 0.:?5 can be seen in Table 3.•
'a' 'a' 'a' 'a' 'a' 'a' a-eairh'c' 'e' 'e' 'c' 'c' 'c' 'c' c-eh
'h' 'h' 'h' 'h' 'h'
•
'h' h-heh 'n' 'n' 'n' 'n' 'n' 'n' 'n' n-nehTable I: Te.mng usmg te.1t111g cl///11 when k- 18. 'o' 'o' o-owh
Frame le11g11t ·/.10, mul ol'erl.1p セ@ 0.
Tobie J: Testing using Testing Data when k=/8.
Datn testing 1 o- Type of Frame length 660, and 01•er/ap • 0,25
2 3 4 5 6 7 cries Data sample 10- Type of
'a' 'a' 'a' 'a' 'a' 'a' ·a· n-eairh 2 3 4 5 6 7 cries
'e' 'e' 'c' 'c' \:' \:' 'e' c-eh 'a' 'a' 'a' 'a' 'a' 'a' 'a' a-eairh 'h' 'h' 'h' 'h' 'h' 'h' h-heh 'c' •c• 'e' 'e' 'e' 'e' 'c' c-eh 'n' 'n' 'n' 'n' 'n' 'n' 'n' n-nch
セ
セ@
0-0\\h'h' 'h' 'h' 'h' 'h' 'h' h-heh
'n'
•
'n' n-neh 'h''o' 'o' 'o' o-owh
Journal of Theoretical and Applied Information Technology 20'" October 2013. Vol. 56 No.2
C> 2UU5 - 2013 JATIT & LLS All rights reserved
ISSN· 1992·8<1-45 \\'\\\\ .j3til org E-ISSN: 1817-319!1
I rom the result
or
testing \IC l..mm that i.ound 'ch' is the most fom1har (Table I, 1able2, Table 3).Whereas sound '011h· is al11,1ys missunderstood and general!) it is kno11n 。セ@ ·ni:h' and 'cairh' (Table 2, Table 3 ). The ュゥA^エ。ィセ@ in iJcntification is caused b)' training data variation ·owh' is bigger
than the others. Codebook ilustration ·eh· and '01\h · when k-18, frame length-=4·IO, and overlap frame
=
0.4 is shown in Figure 4. 1 he figure show codebook distribution ·01\h' ゥセ@ bigger than 'eh'codebook.
OB ,. セ@
l\ I
Gセ@ It I
セiセ@
o; , '111
't, ,,
\ J
'•'
Ij
41NLLM NN]] ]セMi@
,,-
-
-
.
セᄋ@
セMセMM
c; - '•
ve, -\
...__,. \t,, ,
II.01 - ' ' IJ
\I II
,",','-'::
----
-\
|セ@,-.."':..,' セ@
'
-
'
,
.
'- - : --
--- --
:.
....,
QNセN@
.,
O• 01The l ting co lparison \)f 11:,ting data and
trnining data C) use eucltJean distance "ith
fr:une leng1h "' 275 nnd O\'Crl.1p frame = 0 can be
seen in Figure 5.
Th• Cr•pll of AttllfA<) イオゥ。セ@ I t1h
0••• ""'
ヲイ。ュゥ。セ@ Hc .. ra" O.ata •hf'• fnu•t Lt1t1tlt セᄋ@ ·• Q\ f l•e• • t100".
-<)(/ . .
セ@
SO".
-QセN@
VPBセ@ t1•mm1 dau
so-. -1l"itm1data
·IO"o
I l セ@ 1 Q II ll
ᄋセ@
11 1·1 11 1 l.I 27 2'InumbC'r or clu,ft•.--( 1')
F1g1ue 5: The graph o/uccum<1 1t1mg '<'fling data and
trammg data Qセィ・ョヲュQQQ・@ length : -j, overlap"' 0
The illustration about the c'.planation is displa)ed in Figure 6 and I 1gur.: 7. Codcbool. illustration of ·o,.,.h', ·neh", anti · 011h' testing datn
is di5playcd in fゥァオイセᄋ@ 6. Cou.:tiook illustration of
'o\1h'. ·nch'. and ·owh' training data is displa}cd in Figure 7.
There arc 2 codebook in Figure 6 and Figure 7, that is codebook 'O\\ h' and ' neh'. When the testing
using 'owh' testing data, sound ·owh' is known
as
owh (illustration in Figure 6) but when 'owh' training data is tested, its known as 'nch' (illustration in Figure 7). It is because ·o,.,,h. training data (symbol • red coloured in Figure 7) is close to nch codebook (symbol o green coloured) (Figure 7). ll proves that evcthough 01.,.h training data has been modelled as 'owh' codebook, the testing still identify it as 'neh'.
H
°""'
...
00 0 0
146
'..
°ti ., l.dJ f!J Jll f 0.
,.. tSI ,) 0( c
"
_,
.
.
0 0
.
.
"' 0
-1'
.
Figure 6 Codebook 1/lus1ra11on of owh and neh, also 'owh ·testing data
0 Mセッ、ッ」ZNッッャ\@
,,..._
•
Gセ@
_.,_
....
.
+J
05
0
• o o •
. .
tb
11
•
! ·O.S [jl.ill+• •O•
•
. .
1111 c'
•I
•
....
D セc ッ@ Co..
...
# • 0 IS ._....
.
[image:5.628.17.546.134.720.2]セMR@ -u + + • ' "
.,
Figure 7: Codtbook 1llustral1on of owh and neh. also
'owll · lrammg data
\
Jou rnal of Thoorotical and Applied Information Technology 20"' October 2013 Vot 56 No.2
Q 2005 • 2013 JATIT & LLS. All rights reserved·
ISSN: QセRNウセウ@
u
[image:6.626.20.533.49.768.2]...
セ@Figure 8. 7he /111erfacc Of lde1111/1,a11011 Of lnfafll Cms
4. CONCLUSIO:\
Codebook model and \.II CC 11ith the higher accunicy is: frame length I 10, m crlap frame ... 0.4. k - 18. The dis1ancc usi1w \\ hich produce the higher <lccuraq is euclidean d1-,rnm:c. That model can produce accuracy recognition of infant cries \\ith the higher about 94%. 'I he rc:;carch is just cut the silent at the beginning anJ al the end of speech signal. I loJ)(!full), in 1he nc\t イ」セ。イ」 ィN@ the silent can be cut in the middle of <,oumJ so 1hat it can produce more specific sound. II ha' impact on 1he bigger accurac)' as \\Cll.
REFRENCES:
[I] Poe I M, Ekkcl I . Anal) 1ini; Int am Cries Using a Committee or Neural '.\ct\\\ll'kS in order to Det1..-ct I lypoxia Rdutc<l Disorder.
International Journal 011 I 1if'< wl l11telligence Tools (IJAll) \ ol. 15, 'l'. 3. 2006, pp. 397. 410.
[2J Lederman D, Zmora E, I luuschild t S, Stellzig· Eisenhauer A. \\'crmkc K. 2008. Cla!>sitication of cries 11f inJJnls \\ilh clefl-palaLc using parallel hidden f\ l,1rkov models.
/111ernationol Federatio11 Ii.Jr Hcdical and Biolo[!ical ᆪQQセQQQ\_・・イQQQァ@ \ ol 16, 2008, pp.
965-975.
[3) Rc)cs-Galavil OJ. r・ケ 」ウᄋcイセオᄋ」ゥ。@ Ct\. 2004. i\ S)stem for the Processing of Infant Cry to Recogni1.c Pa1hologics in lll!ccntl) Born
Babies with l\cural Nl.'I\\ orl..s. lt11ernatio11ul Speech Com111u11icatio11 ls.wciatlon, Rusiu, Sep1ember 20-22, 200..J.
(4] Lee C. Lien C. I luang It Automatic Recognition
or
Birdslings Using Mel-frcqucncy Ccpstral Cocllicicnts and Vec1or Quantization. lntl.!f!Wlional .\/ulliConferenceE-ISSN: 1817.J l9S
of Engineers and Computer Scientists, Hong Kong, June 20-22. 2006.
[5] Kumar C. Rao PM. Design of an Automatic Speaker Recognition System using MFCC, Vector Quantization, and
LBG
Algorithm./11tcrna1io11al Journal on Computer Science and £11gi11eering (JJCSE) Vol. 3, No. 8, 2011 , pp. 2942-2954.
[6] Singh S. Rajan EG. Vector Quantization Approach for Speaker Recognition using MFCC and Inverted MFCC. lnternarional Journal of Compurer Applications Vol. 17, No. I , 2011 , pp. 1-7.
(7) Patel K, Prasad RK. Speech Recognition and Verification Using MFCC & VQ
International Journal of Emerging Science and Engineering (JJESE) Vol. I, No. 7, 2013. pp. 33-37.
[8) Linde Y, Ouzo A, Gra.> RM. 1980. An Algorithm for Vector Quantizer Design.
IEEE
Transoclions On Communications, Vol. 28, No. I, 1980, pp. 84-95.[91 Abbas OA. Comparisons Between Data Clustering Algorithms. The lnrernational Arab Journal of Information Technology, Vol.
5,
No. 3. 2008,pp.320-325[I OJ Brindha M, Tamilselvnn GM, Valarmathy S,
Kumar MA, Suryalakshmipraba M. A Comparath
c
Stud) of Face Authentication Using Euclidean and Mahalanobis Distance