Do Thi Loan vd Big Tgp chf KHOA HOC & CONG NGHE lI2(l2)/2:89-95
NGHIEN c u t ; PHl/OnVG PHAP NHAN DANG PHAN BIET TIENG NOI V 6 l AM NHAC
DS Thj Loan*, Lu'u Thj Lilu, NguySn Thj Hien
Trudng Dgi hpc Cong ngli? ihong Un vd Truyen thong - DH Thai Nguyen TOM T A T
Ty dpng nhdn dang phdn bift tieng n6i vdi Sm nhgc Id c6ng cy quan trpng trong nhieu dng dgng da phucmg tien. De nhdn dgng phan bipt tieng ndi vdi dm nhgc, chiing toi dS sir dyng ba d3c trung: tdn sudt vugt diem khong cao (HZCRR), ty If khung c6 nSng lugng ngSn hgn thdp (LSTER), dO bien thien pho (SF) vd thuat todn sir dyng de hudn luyfn cung nhu nhgn dgng ig K-NN (K Nearest Neighbor). DCt li?u id cdc dogn nhgc gdm nhifiu thi loai tir nhac khong Idi tdi nhgc c6 ldi (nhgc Viet Nam, nhgc Rock, nhgc Pop, D6ng qufi), cic doan tidng noi cua gipng nam vg niJ bing tieng Viet. Trong bdi bdo ndy myc dich nghien ciru ban ddu ciia chiing t6i chii yeu Id nhgn dgng phan biet hai loai dm thanh: tieng ndi vd am nhac vdi ket qud thu dugc cd dg chinh xdc kha cao, vdi tieng noi co dp chinh xdc xap xi 84%, am nhgc Id 92%. Trong tuang lai chiing toi mong muon phdt trien he thong co khd nSng nh^n dgng phan bipt nhieu Idp am thanh han.
Tfr khoa Phdn biel, tieng ndi, dm nhgc. nhgc I 'lei Nam. tieng Viet.
GIOI THIEU
Nhgn dang phan biet tiang ndi vdi am nhac la mgt phdn trong he thdng phan logi am thanh ASC (Audio Signal Classifier) [I] hay trong he thdng nhdn dgng cdc khung cdnh dm thanh CASR (Computeral Audio Scence Recognizer) [2], nhdn dgng cdc chuang trinh tren ti vi [3], [4], hay he thdng phien dich ndt nhac AMTS [5]. Da xay dyng mdt he thdng hodn chinh thi rat khd khan vi dm thanh rdt phong phu, da dgng va mdi logi cd nhirng dac trung riang, sy ket hgp giira chimg tgo nen vd vdn cdc dgng am thanh khac nhau, dieu nay dnh hudng Idn den viec phan loai cdc khung canh dm thanh. Hau hat cdc nghien cuu nhdn dgng phan biet cdc Idp dm thanh deu cdn cu theo tiing trudng hgp md bgn dua ve so ldp, va mdt vdi diau kien rdng bugc khde. Chang hgn phdn loai am thanh thanh bdn Idp: am nhac, tiang ndi, nhieu, khodng lang [4], [6]
hoac chi phan thanh tiang ndi va am nhgc kh6ngthdi[3], [7],
SU" K H A C N H A U GIU'A TIENG NOI VA AM NHAC
Cdc tin hieu dm thanh Id mgt tin hieu cd y nghTa trong khoang thdi gian ngan. Khi kiam
" Tel 0972998865. Emad dtloan@iciii edu v.
tra tin hiau am thanh trong khodng thdi gian du ngan (giira 5 vd 100msec), ta cd tha nhgn thay dac diam ciia nd Id khd cy the. Tuy nhien trong thdi gian ddi, cac dgc tinh ciia tin hieu thay ddi de phan dnh dac diam ciia chudi tin hieu nhu mgt bai phat biiu hay mpt dogn nhac. Trong phdn ndy, chiing toi dua ra mgt sd nhdn dinh ve sy khac biet giira tiang ndi vd am nhac nhu sau:
Thanh dieu: Giai dieu cd y nghTa sy bieu thi ciia dang sdng dm thanh. Am nhgc cd xu hudng dugc tao ra tir sy da dgng ciia cdc tdn sd. Cdn tieng ndi cd giai dieu tir chinh sdc dieu va gigng ndi cua ngudi ndi.
Chudi thay the: Tieng ndi cho ta mdt chudi cdc tiang dn, khodng lang xem ke tirng dogn trong khi dm nhgc khdng cd. Ndi cdch khde, Idi ndi cd tin hieu phan phdi thdng qua quang phd ngdu nhian hon so vdi dm nhgc.
- Bang thdng: Tiang ndi thu'dng cd 90%
ndng lugng tap trung d tdn so thdp hon 4kHz (va hgn cha den 8kHz), trong khi am nhgc co the md rgng thdng qua cac gidi han tren khodng 20kHz.
Phdn phoi: Ndng lugng ciia tieng noi thudng tap trung d tan sd thap sau do gidm rat nhanh trong cdc mien tdn sd cao han Con tin hiau dm nhac thi trdi deu hon.
Do Thj Loan vd Dig Tgp chi KHOA HQC & CONG NGH$ n2(12)/2:89-95 - Tan so ca ban: vdi tieng ndi cy the, ta cd
the xac djnh dugc tdn sd ca ban nhung vdi am nhgc thi khdng.
Khodng am dipu: Thdi hgn ciia nguyen am trong tieng ndi Id rat thudng xuyen. Am nhgc the hipn mgt bien the rgng Idn hon chieu ddi ciia giai dipu, khdng dugc hgn che do qud trinh phdt am nhgc.
- Ndng lugng ngdn hgn: Ndng lugng ciia tin hipu tieng ndi cd sy bien thien nhieu hon so vdi tin hipu am nhgc.
- Ty Ip vtrgt diam khdng: Tiiy thugc vao tin hieu am nhgc va tieng ndi nhung thdng thudng ty Ip vugt diem khdng ciia tin hipu tieng ndi se Idn hon tin hipu am nhgc.
LU'A CHQN DAC TRU'NG VA PHUONG P H A P N H A N DANG P H A N B I $ T TIENG NOI VCJl AM NHAC
Cho tdi nay cd khd nhiau dgc tinh ciia tin hipu dm thanh da nhdn dgng, phan biet tieng ndi vd dm nhgc hay cdc he thdng nhdn dang phdn loai khde nhau. Mdi nghian cuu deu dua ra mdt sd lugng cac dgc tinh ciia tin hieu dm thanh va phuong thiic sir dyng de phan logi.
Cdc ddc tinh ciia tin hieu dm thanh thudng dugc chia lam hai loai chinh Id: cdc dac tinh vat Iy vd cdc dgc tinh cdm thu dm thanh ciia con ngudi.
Ddc tinh vgt ly Id cdc dac tinh dac trung trong mian tan sd vd ddc trung trong mian thdi gian nhu: bian dp, tan so vuot diem khdng ZCR, nang lugng ngdn han, he so phd MFCC, cap phd tuyan ti'nh LSP (Linear Spectrum Pair) [6], do bian thien phd SF.
Dac tinh v'e cam thu am thanh ciia con ngudi Id cac ddc tinh dugc con ngudi cam nhan nhu nhip dieu, dp cao ciia am (Pitch), dp ngdn, am sac,.... Ciing nhu nhiau nghian ciru trudc ddy, de nhan dgng phan biet tiang ndi vdi dm nhgc ndi riang hay nhan dang phdn biet cdc Idp dm thanh khac noi chung hdu nhu chi su dung cac dac trung vat ly Id dii Bdi vay trong bdi bao ndy, chiing tdi cung chi dimg cac dgc trung lien quan tdi mian tdn sd va mian thdi gian (ddc trung vat ly).
Dya tran cdc phan tich, danh gia ve ddc diam ctia tin hieu dm thanh, giu'a dm nhac va tiang ndi ve ddc diem am hoc. dai tdn, dac diem \e 90
phan bo ndng lugng, chiing tdi dd lya chgn ba d^c trung: Ty Ip tan suat vugt qua diim khdng cao HZCRR (Hight Zero Crossing Rate Ratio), ty Ip khung cd nang lugng ngdn h^n thap LSTER (Low Short Time Energy Ratio) va dg bien thian pho SF (Spectrum Flux). Con phuong phap nhgn dang phan bipt chiing toi su dyng Id thugt toan K lang giang gdn nh^l K-NN (K Nearest Neighbor) [8].
Lua chgn ddc trtrng
Dac trung tan suat vir0 qua diim khong cao - HZCRR
Hinh I,- Bieu do tdn sudi vut/i diem khdng cua tin hiiu dm thanh Cdng thirc cua HZCRR nhu sau:
HZCRR = — ^ [ 5 / g n (ZCRn-THL)+l
Trong dd:
- n la thu ty ciia cira sd trich chgn dgc trung - N la do rdng ciia cua sd trich chgn dgc tnmg - ZCR Id tan sudt vugt diem khdng tronj khodng ngan theo cdng thirc :
ZCRK= J - iiUign{x„,)-sign{x^,,)\
F: dp ddi khoang ngdn - thudng Id 1 frame THL la tan sudt vugt di^m khdng trung binh trong cira sd theo cdng thuc:
Bgc Irung nang lirmg ngSn han cua tit Milt- LSTER
Cong thirc tinh LSTER nhu sau:
D6 Thi Loan vd Dtg Tap chi KHOA HQC & CONG NGHE l2(12)/2: 89-95 Trong do:
- STE Id nang luong trong khodng ngdn (trong I frame) theo cdng thirc:
k
W Id cua sd (cd tha Id chif nhdt hoac hamming)
- THL la nang lugng trung binh theo cdng thuc:
THL = — y[STEJ Dac trieng dg bien thien pho - SF
-log(^(«-I.m))*.rf]- Trong dd:
K la bac ciia phd DFT,
- 5 la hang sd be (=0.01) de loai trudng hgp log(O).
A(n,m) Id bien ddi Fourier rdi rgc(DFT) theo cong thiic:
2;r
00 J -—mi
.>t'
Hinh 2. Biiu do histogram do biin thien phd theo khong gian 3 chieu {a): music (b):speech Thuaf toan KNN
Thuat todn K-NN [8] Id phuang phap phdn logi dya tran chi tiau khdng gian khoang each, Xdc dinh mot diem thugc mien ndo bang cdch tinh todn dya tren khoang cdch khdng gian.
CO nhiau phuang phdp da tinh khodng each giiJa cdc vecto nhu phuong phdp do khodng cdch Euclidean, phuang phdp do khoang each Hamming, phuong phdp do khodng each
Mahalanobis hay phuong phdp do khodng each City Block.
Bdi todn: Gia su ta cd mgt khdng gian da chiau (Y|, Y2,...,Y„) vd cd mpt tap hgp cac khu vyc A, B trong dd:
- Khu vyc A ta biet dugc sy ton tgi ciia cdc doi tugng XAI XA2, ••- XA„ vdi XA,= { YAM,
Y A , 2 . . . . , Y A , „ }
- Khu vyc B ta chi biat sy tdn tgi cua cdc
doi tugng XBI, XQI, , . . Xi,,, vdi XBI={ YBII, YB,;,..., YB,„}
Cd mpt doi tugng X, ( Y,,, Y,2,..., Y,„) bat ki ta cdn xdc dmh ddi tugng X, ndy thugc khu vyc A hay B.
Hinh 3 : Md la thudt todn K-NN
Gidi thudt: Trong tat ca cac ddi tugng da xdc dinh rd khu vyc A vd B, ta tim K ddi tugng gan vdi X, nhat, trong K ddi tugng ndy se xdc djnh xem cd bao nhiau ddi tugng thugc khu vuc A, bao nhieu ddi tugng thupc khu vyc B, khu vyc ndo nhieu ddi tugng gdn X, hon thi X, cd khd ndng thugc khu vyc do.
Da tinh khodng cdch giua cac vecto diing cong thii'c:
D(x.x-)=^^^ ~ ^i'^' + ^^2 - vy^-MY, ~ v ; r THUC HIEN HE THONG NHAN DANG P H A N BIET TIENG NOI V6l A M NHAC He thdng cd dang tdng quat nhu hinh 4.
Hogt dgng ciia he thdng gdm hai qud trinh riang biet, thir nhdt Id qua trinh hgc (hudn luyen) vd thir hai la qua trinh nhan dang phan bipt vdi tin hieu ddu vdo.
Qud trinh huan luyen. Tin hieu ddu vdo dugc dua vao phdn tich ddc trung, Tai ddy chiing dugc xii' ly, tinh toan va Idy ra gia tri cdc dac trung can trich chgn phuc vu cho viae xay dyng he thdng. Sau dd tdi khoi huan
Do Thj Loan vd Dig Tgp chi KHOA HQC & CONG NGHfi II2(l2)/2:89-95 luyen dugc xii- ly vd luu vao ca sd dir lipu
(CSDL) mdu. Qua trinh huan luypn dimg phuong phdp hgc cd gidm sat nghTa la chiing ta dd biet ro sy phdn ldp tren tgp dQ' lipu mau diing de hgc, d day chi cd hai Idp; tieng ndi vd am nhgc. Cdc d§c trung mau cua tirng Idp dugc trich chgn luu riang vdo CSDL.
1 „„.» 1 1..», 1
1
Hinh 4: Md hinh long qudt ciia h4 thdng Qua trinh nhan dang phan biet: Trinh ty thyc hien cung nhu tran nhung chi khac la tin hipu sau khi dugc trich chgn ddc trung se dugc dua vdo khdi nhgn dgng phan bipt. Tgi khdi nay chung ta phan tich danh gia vdi CSDL mau da dugc huan luyen thdng qua thuat toan K-NN. Kat qud nay sau dd dugc chuyen tdi bg ra quyat djnh de xdc dinh xem tin hieu hieu dd thudc Idp tin hipu ndo. Vecto ddc trung Id vecto 3 chieu vi ta chi chgn 3 ddc trung nhu da trinh bdy d tren
Phdn khung tin hieu: Do tin hieu tiang ndi dn djnh trong khodng vai chuc ms, nan khi tien hanh cdc phep phdn tich, bien doi ngudi ta thudng chia tin hieu thdnh cd dogn nhd khodng 10 dan 30ms, dd dugc ggi Id phdn khung, cdc khung tin hieu lien tiep cd tha chong nhau khoang Vi do ddi
Tuy nhien \in de khi phan khung ciia tin hipu dd chinh Id sai so ciia ca phep bian ddi so vdi tin hieu goc, do dd nSn sir dyng hdm cua so de hgn che cdc sai s6 do dp dai hQ-u hgn ciJa cdc tin hipu gay ra trong cac phep bien doi.
Ham cua sd thudng dugc dung la Hamming dugc cho bdi cdng thirc sau:
W„ = 0.54 -0.46 * c o s ( ^ ^ ) N ~l KET QUA
Cai dat he thong • Chung tdi thyc hipn hp thong nhgn dgng phan biet vdi tin hieu dau vdo Id cac file am thanli chudn dang WAVE (*.wav), vipc tinh toan, xii ly, phdn biet dau thyc hipn dya tren file wave nay. Nhu da phan tich d tren qua trinh huan luypn gom cdc budc co ban sau:
Hinh 6: Md hinh qud trinh hudn luyen "*
Vdi moi day tin hieu am thanh dgc dugc, ta thyc hipn xac djnh khung tin hieu, tinh cac thdng sd CO ban STE. ZCR. A ciia day tin hieu.
Giao dien cdi ddt cua qud trinh huan luyen:
H"iM'M^M-'P"
^ ^
Hinh 5: Phdn khung tin hieu Hinh 7: Giao dien hudn luyen. tgo </(> liiu n
Do Thj Loan va Dig Tgp chf KHOA HQC & CONG NGHE II2(l2)/2:89-95 Ben phdi la do thj cua tin hipu: tgi khung ciia
so thir nhdt la dang tin hieu dm thanh, tiep theo la nang lugng trong khoang ngdn hgn vd tdn suat vugt diem khdng ciia tin hieu am thanh.
- Ben trdi la cac diau khian: md file wave, nghe thir, xdc dinh tiang ndi hay am nhgc, lu'u da lieu.
Qud trinh nhan dpng:
... .«»,«] {¥] [::
ff t
^ ^ ^
Hinh 8: Md hinh qud trinh nhdn dang Qua trinh nhan dang cd mgt sd budc triing vdi qua trinh hudn luyen nhu viec dgc dO' lieu file wave, thdng so co bdn, tinh cac thdng sd ddc trung.
l|,^,^.,^,|4|„>,f^f^„^„^.lf,
Hinh 9, Giao dien nhdn dgng phdn hi^l Tuong ty nhu giao dien hudn luyen, giao dien nhan dang cung cd cdc phdn:
- Ban phdi la do thi bieu dien ciia tin hieu: tai Jthung cira sd thir nhdt Id dgng tin hipu cua dm -thanh, tiap theo Id nang lugng trong khodng .ngdn hgn va tan sudt vugt diem khdng ciia tin 'lieu am thanh, tuy nhien khde vdi giao dien ludn luyen, giao dien nhdn dang cdn cd them
khung cira sd thd 4 the hien day Id tieng ndi hay dm nhgc (tiang ndi cd bien dp bdng 2/3 khung cdn am nhgc cd bian do = 1/3 khung).
Ben trdi cung Id khung diau khian md, chgn tin hieu file wave. Ngodi ra cdn cd sy lya chgn tham so K (K Id so phan tur thugc ldp dgc trung mdu gan vdi mau cdn nhdn dang phdn bipt nhat).
Danh gia
Chuong trinh thyc hien phdn biet tiang ndi vd dm nhgc dya tren mdt tap cdc tin hieu am thanh mau ma tdi suu tdm cd dugc : tap hgp tiang ndi la tieng Viet, tap hgp am nhgc id cdc the loai nhac khdng ldi cua mgt sd trudng phdi dm nhgc.
Tap hgp tiang ndi gdm cd 1037 file la cac file phdt am cac tir ciia tieng Viet, mdi file cd do ddi < Is, cd tdn sd lay mau 16000Hz, bit rate la 16bit/mau.
Tap hgp dm nhac gdm co 77 file la cac file nhac khdng Idi ciia cdc the loai R&B, Rock, Country.... Mdi file cd do ddi < 30s va cd cimg tan sd lay mdu 16000Hz, bit rate 16b it/ma u.
Cdc file dii" lieu mau tren deu Id cac file dm thanh mono (mdt kanh).
Qua thu nghipm, thdng ke tdi thdy chuong trinh da thyc hipn viec phdn biet tiang ndi vd am nhac vdi ti le chinh xdc tdt vdi cac trudng hgp tieng ndi va am nhac rieng biet, Sau day Id ket qud thu dugc khi thir nghiem
Bang 1: Kit qud thdng ke ca sa du lieu Am Tieng nhac n6i Gi^ tri trung binh cua
LSTER 0,2048 0.14599
Gia in trung binh cua ^ ^^^^ 0,2632 HZCRR
Gia tri trung binh ciia SF 0.3885 0.22
Do Thi Loan vd Dig Tap chl KHOA HOC & C 6 N C N 0 H $ ll2(l2)/2:89-95 Bang 2: Kel qud
iuimg ddu vdo la
thong ke nhqn dang vdi mdt so lieng ndi vd dm nhqc vdi K~i
Am nhac
^ h i l n d a n g l S a m 10838432 nh?c (92.36%) NhSn dang Ifl
ti^ng n6i Tong
897324 (7,64%) 11735756
(100%)
Ti^ng ndi 945553 (15.56%) 5131722 (84.44%) 6077275 (100%
Bing i: Kiit qua thong ke nhdn dang vdi mdt .id Im/ng diiu vdo Id tieng ndi vd dm nhac vdi K^5
Nhan dang 1^
am nhac Nhan dang IS
lieng ndi Tong
Am nhac 10878964 (92.7%)
856792 (7,3%) 11735756
(100%)
Tieng n6i 974188 (16.03%) 5103087 (83.97%) 6077275 (100%) KET LUAN
Trong nghien ciru nay chimg tdi chu yeu tap trung phan tich ddnh gid cdc ddc diem vat ly, ddc diam va cam thy dm thanh cua hai tin hieu: am nhac vd tieng ndi: sau khi thir nghiam diing ba dac trung HZCRR, LSTER, SF vdi thudt toan phan logi K-NN chiing tdi thdy kat qud thu dugc la kha tdt Trong tirong lai, chiing tdi se tiap tuc hoan thien he thdng sao cho cd dugc mdt he thdng hodn chinh da cd tha thyc hien ty ddng nhan dang phdn biet tieng ndi vdi dm nhac dem dp dyng vao thuc
t^ (irng dyng ty dgng thu thgp thdng tin, ddnh gid chi muc cho dif lipu da phuong tipn.
TAI LIEU THAM K H A O [I]. David Gerhard, (2000), "Audio Signal classification: an overview" , Canadian Artificical Intelligence, 45:4-6, Winter.
[2]. Peltonen, V., (2001) •Computation^
Auditory Scene Recognition". MSc Thesis, Tampere University.
[3]. Saunders. J.. "Real-Time Discriml-nation of Broadcast Speech/Music", Proc. ICASSP, pp993-996
[4]. Srinivasan, S, (1999), Petkovic, D„
Poncelcon, D, "Toward robust features for classifying audio in the CueVideo System", Proc 7'" ACM int. Conf Multimedia, pp, 393-400.
[5]. M.D. Plumbley, S.A Abdallah, J.P. Bello, M.F. Davies, G, Monti , M.B. Sandler (2002)
"Automatic music transcription and audio source separatioh'\ Cybernetics and System, 33(6):603-627.
[6]. Lu. L.. Jiang, H., and Zhang, H. J., (2001),
"A rohusl audio classification and Segmentation method', in Proc. 9''' ACM Int Conf Multimedia, pp 203-211.
[7]. Scheier. E. , Slaney. M.. (1997),
"Construction and Evaluation of a Robust Multifeature Speech/Music Discrimination". Proc, ICASSP. ppl331-1334.
[8]. S Theodoridis. K. Kontroumbas (1999),
"Pattern Recognition", Academic Press.
D6 Thj Loan vd Dig Tgp chi KHOA HQC & CONG NGHE 112(12)/2: 89 - 95
S U M M A R Y
R E S E A R C H I N T O M E T H O D O F D I S C R I M I N A T I O N B E T W E E N S P E E C H A N D M U S I C
Do Thi Loan*, Luu Thi Lieu, Nguyen Thi Hien College of Information Communication and Technology - TiW
Automatic discrimination of speech and music is an important tool in many multimedia applications. For the discrimination of speech and music we have used three characteristics' HZCRR (High Zero Crossing Rate Ratio), LSTER (Low Short Time Energy Ratio), SF (Spectrum Flux) and the algorithm for training and discrimination is K Nearest Neighbor. The data is musical segments with different kind of music like Vietnamese music, Rock, Pop songs, country music and speech segments of male and female voices for Vietnamese. In the article the major objective of our research is to discriminate two audio signals: speech and music. We have got results with rather high accuracy: about 88% for speech and 92% for music. In the future, we would like to develop the system to classify more classes of audio signal.
Keywords: Discrimination, speech, music, Vietnamese music. Vietnamese
Phan bien khoa hoc: TS. PhgmDucLong - Trudng Dai hgc CNTT & TT-DHThdi Ngmvn Tel- 0972998865, Email dlloan@ictu edu.v.