• Tidak ada hasil yang ditemukan

website: sj.ctu.edu.vn Tap chi Khoa hoc TrUdng Dai hoc Can Thd

N/A
N/A
Protected

Academic year: 2024

Membagikan "website: sj.ctu.edu.vn Tap chi Khoa hoc TrUdng Dai hoc Can Thd"

Copied!
7
0
0

Teks penuh

(1)

TgpchiKhoahgcTrudngDg,hgcCdnTha Phdn A: Khoa hgc Ty nhien, Cdng nghi vd Mdi trudng: 32(2014): 35-41

Tap chi Khoa hoc TrUdng Dai hoc Can Thd website: sj.ctu.edu.vn

nuHpc

PHAN L d p DC LIEU VOfI GIAI THU^T NEWTON SVM D5 Thanh Nghi', Nguyin Minh Trung^ vi Pham Nguyen Khang' ' Khoa Cdng nghi Thdng lin & Truyin thdng, Tnrdng Bgi hgc Cdn Tho

^ Khoa Khoa hgc Ty nhien, Trudng Dgi hgc Cdn Tho Thdng tin chung:

Ngdy nhgn: 19/02/2014 Ngdy chdp nhdn: 30/06/2014 Title:

Data classification using The Newton Support Vector Machine algorithm Td khda:

Gidi thugt Newton support vector machine, trgng sd thich nghi vd kit hgp, ARC- x4, phdn lap dH li?u lan Keywords:

Newton support vector machine algorithm, adaptive reweighting and combining, ARC-x4, classifying large datasets

ABSTRACT

In this paper, we propose a new machine learning algorithm, called the ARC-X4 of finite Newton Support Vector Machine (NSVM) for classifying very large datasets on standard personal computers (PCs). SVM and kernel related methods have provided accurate classification models but their learning tasks usually need a quadratic programming with the requirement of large memory capacity and long time. We extend the recent NSVM proposed by Mangasarian for building a boosting-SVM algorithm.

We have used the Sherman-Morrison-Woodbury formula to adapt the NSVM to process datasets with a very large number of dimensions. We have also applied the ARC-x4 approach proposed by Breiman to NSVM for classifying massive datasets with a very large number ofdatapoints as well as a very large number of dimensions. We have evaluated its performance on bio-medical datasets with a PC (2.4 GHz Pentium IV, 2 GBRAM).

T 6 M TAT

Chung toi trinh bdy trong bdi viet mgt gidi thugt hgc mdi, ARC-x4 Newton support vector machine (ARC-x4-NSVM), cho phdn logi tap dQ li$u Idn tren mdy tinh cd nhdn. Mdy hgc vec-ta ho trg (SVM) vd phuong phdp ham nhdn cung cdp md hinh phdn lap dH lieu chinh xdc nhirng qud trinh hudn luyin md hinh cdn gidi bdi todn quy hogch todrCphudngrdt mdt thdi gian vd cdn nhieu bg nhd Chung tdi de xudt md rgng gidi thudt hgc NSVM cua Mangasarian de xdy dung gidi thugt cdi tiin SVM. Chung tdi di xudt dp dyng cdng thuc Sherman-Morrison-Woodbury vao gidi thugt NSVMdi cd the xic ly dir Hiu cd so chiiu rdt Idn. Tiip theo sau, chiing tdi kit hgp vdi phuong phdp ARC-x4 cua Breiman di xdy dwig gidi thugt ARC-x4-NSVM^

cd the phdn logi dH lieu vdi kich thudc Idn vi sd phdn tif cung nhu sd chieu. Chung tdi ddnh gid hieu qud cua gidi thugt di xudt tren tgp dd liiu y sinh hgc sir dyng mdy tinh cd nhdn (2.4 GHz Pentium IV, 2 GB RAM).

1 C l d l T H I $ U

Tir khi dugc gidi thieu bdi Vapnik [15], giai thugt miy hpc vdc-ta hd up (SVM) ud thinh phuong phip miy hpc bihi hifu dl giai quylt cic vin de phan ldp, hdi quy. Miy hpc SVM d i dugc ip dyng thinh cdng uong rat nhieu ung dung nhu

nhan dgng mgt ngudi, phin logi vin bin, phin logi bfnh ung thu (tiiam khio tgi [8]). Bing vifc kit hgp vdi phuong phip him nhin, miy hpc SVM cung cip cic mo hinh hidu qui chinh xic cho cic van de phin Idp v i bdi quy phi tuyIn tiong thyc tl.

Mae dii ed dugc nhiing uu dilm kl tidn, vifc huan luyfn cua giii tiiugt miy hpc SVM rit mit thdi

(2)

\\Tgp chi Khoa hgc Trudng Dgi hgc Cdn Tho Phdn A: Khoa hgc Tu nhiin. Cdng nghi vd Mdi trudng: 32(2014): 35-41 I Eian v i tieu tdn nhidu khdng gian bd nhd do phm

Sj^ bii toin quy hogch toJin phuang. Do phuc tgp II tdi tiiilu huin luyfn cda giii tiiuit may hgc SVM 11 ludn la bic 2 so vdi sd lugng phin td dOr lifu. Do

I

do, can thilt phii ed nhflng cii tidn dk giai thuit hpc SVM cd thi xu ly dugc cic tip dit lifu vdi j j kich thudc Idn ve so phin td cdng nhu sd chieu.

II Dd cii tiln vifc huin luyen gim thuit may hpc 11 SVM cho cic tgp dfl lifu Idn. Cac cdng trinh

n ^ d n cuu Uong [2], [4], [I I] di chia bai toan quy hogch toan phuang gdc thinh cie bai toan con dl giii quylt. Nghien cuu cua [12], [13] da dl nghj xiy dyng giii thugt hpc tang tmdng, chi nap dfl lifu tung phin rdi c^p nhit md hinh theo dfl lifu mi khdng cin ngp toin bd tap dfl lifu tiong bd nhd.

Cdng trinh nghidn ciiu eia [13] dd nghj giai thugt song song Uen mgng dk cai tiiifn tdc dp huin luydn. Tong & Koller [14] dl nghi phuang phap chpn tip con dfl lifu thay vi phii hpc trdn t o ^ bd tgp du lifu gdc,

Trong bii vidt niy, chiing tdi mudn tiinb biy mpt ^ i i thugt hpc mdi, ARC-x4-NSVM, dimg cho phan loai cic t$p dfl lifu Idn tien miy ti'nh e i nhan.

Chung tdi md rdng giai thuit hpc NSVM ciia Mangasarian [10]. Giai thuit hpc NSVM chi cin giii cic hf phuong Uinh tuydn ti'nh thay vi Ii bii toin quy hogch toin phuong phdc tgp hon nhu giii thuit miy hpc SVM chuin. Chiing tdi di phit tridn theo hudng thich iing giii thuit NSVM cho phan loai do lifu cd sd chilu Idn thudng gap uong cic vin de ve phin logi vin bSn hay trong sinh tin hpc.

D I dgt dugc muc tidu niy, chung tdi ap dyng cdng thiirc Sherman-Morrison-Woodbury [7] giiip cho giii thuit cii bien cda NSVM cd the lam vide cho du lieu cd sd chilu rit Idn nhung co sd phin tu J(ddng) nhd. Sau do, chiing tdi kit hpp vdi giii thugt ARC~x4 eia Breiman Uong [3] tilp tyc phat Uiln giai thuit ARC-x4-NSVM co till phin loai dugc du lieu co kich tiiudc Idn c i vl sd dong vi sd lugng chilu. Chiing tdi ciing tiln hinh danh gii hieu qui dya tidn tidu chi nhu tiidi gian huin luyfn v i dp chinh xic ciia giii tiiugt ARC-x4-NSVM su dyng miy tinh ca nhin (Pentium 2.4 GHz, 2 GB RAM, Linux), Kit qui chay thu nghiem udn cic tip da lifu Idn y smh hpc [9] cho thay gi^ thuit ARC-X4-NSVM cua chiing tdi dd nghi cd thdi gian huin hpc rit nhanh vi cho dp chinh xic cao kbi so sanh vdi cac giai thugt miy hpc SVM chuin nhu LibSVM [4].

Phin tilp theo ciia bii dupe td chuc nhu sau.

Phin 2 sd Uinh biy tdm tit vl giii thuit miy hpc NSVM. Phan 3 sd chi ra cich xiy dyng giii thuit

ARC-X4-NSVM cho phin logi du lieu Idn. Ket qui chay thii ngjiifm se dupe Uinh bay tiong phin 4 trudc khi ket thiic bing ket luan vi hudng phit triln.

Chiing tdi sd dyng cic ky hieu va khii niem tiong p h ^ tiep theo ciia bii viet nhu sau: tat c i cic vec-ta ddu la vec-ta cpt, tich vd hudng cua 2 vec-ta xvky dugc ky hifu l i x.>', dp dii vec-ta p h ^ tuydn cua vee-ta x dupe ky hidu li ||x||, Uong kbi x^

cbinh la chi^en vi cua x v i e la vec-ta cdt ma cic thinh phan bang 1.

2 GIAI THUAT MAY HOC NSVM Xdt vi du phan ldp nhj phan tuyIn tinh nhu Hinh 1. Clio m phin td xj, x?, ..., x^ tiong khdng gian n chilu, bilu diln bdi ma tran AfmxnJ, cd nhin (Idp) ciia cic phin td Ia>'/, y2, ..., y™ cd gia ui I hoic gii tri -I, bilu diln bdi ma trin dudng cheo DfmxmJ cua l,-i.y, = 1, nlu X/ thudc Idp +1 (ldp duong, ldp chiing ta quan tam), y, = -1, nlu x, thupc ldp - / (Idp am hay cac ldp cdn Igi),

2.1 Giai tiiugt SVM

Giii thugt hpc SVM cda Vapnik [15] tim sieu phing tdi uu (xac dinh bdi vec-ta phip myen w va dp Ifch ciia sieu phing b) dya tidn 2 sidu phing hd trg eiia 2 ldp.

Cic phin tu Ai ciia ldp +1 nim bSn phii cua sieu phing hd trp cho ldp +1, cic phin ticAj Idp -1 nim phia bdn trii cua siSu phing ho tig cho ldp -I.

A,.w-b >1, VicdD[l,l] = l (I) Aj.w-b <-l. \^c6D0.j]=-l (2) Kit hgp (1) v i (2) ta dugc:

D(Aw-eb)>e (3) Uong dd e la vecto cdt mi tit ci cie phin tu

ciia ndd€u bang 1,

Hinh 1: Phin Idp tuyen tinh vdi miy hpc SVM

(3)

Tgp chi Khoa hgc Trudng Dgi hgc Can Tho Phdn A: Khoa hgc Tu nhiin, Cdng nghi vd Mdi trudng: 32 (2014): 35-41 Khoang c i c h gifla 2 sieu p h i n g hd trg gpi l i l l

(margin) v i d u p e tinh b i n g :

margin = j-j (4)

H

Uong do | | w l | l i dp dai cua vec-ta w.

Sidu p h i n g k i t q u a (w, 6) p h i n chia tgp cac d i l m thanh 2 Idp n i m d gifla 2 sidu p h i n g hd tig.

Bat e d d i l m x, nao n i m sai phia so vdi sieu p h i n g ho tig cua nd dugc xem la ldi. K h o i n g each ldi d u p e b i l u d i l n bdi z, > 0 (vdi x, n i m dung phia ciia sieu p h i n g bd trg ciia nd thi k h o i n g c i c h ldi t u a n g dng z, = 0, con ngugc lai thi z, > 0 la k h o i n g each t u d i l m * d i n sieu p h i n g hd U p tuong iing ciia nd). Vifc tim k i l m sieu p h i n g tdi uu cua g i i i thuitt m i y hpc SVM b i n g vdi vifc c u e dgi hoa le (ie

cang Ion, mo hinh phan lop cang an toan)

v i e y e t i l u hda loi. Giii t h u i t SVM dan d i n bai toan quy hogch t o i n p h u a n g sau:

rninf = C\\4^l\\4

(5)

vdi r i n g bupc D(Aw - eb) + z > e Trong d6 C > 0 la h i n g sd cho phdp d i l u chinh muc dp ldi (z -> 0) v i dp rdng ( l l ) ciia 2 siSu p h i n g hd trp.

G i ^ b i i toan quy hogch t o i n p h u a n g (5), chung ta thu dugc sidu p h i n g (w, b). Viec phan logi cho p h i n t u mdi d y a tien sieu p h i n g kdt q u i (iv, b) dupe tinh theo cdng thdc sau:

predict (x) = sign(w.x + b) (6) Giii thugt SVM co bin chi ^ i i quyet dugc bai toin phin ldp myln tinh, tuy nhidn nlu ta kit hpp SVM vdi phuong phip bam nhan (kemel-based method) se cho phdp giii quylt Idp cac bai toan phin Idp phi myen. Cd thi tham khio ehi tilt hen Uong cac tii lifu [1], [5],

Dp phiic tgp tinh toin ciia bii toin quy hogch toin phuang (5) tdi thilu li 0(m') tiong do m la sd lupng phin td dupe diing dd huan luyfn. Dieu nay lam cho giai thugt SVM khdng phu hgp vdi dfl lifu Idn.

2.2 Giii thuSt Newton SVM (NSVM) Giii thuit NSVM do Mangasarian [10] dl nghi, cii bien bii toin SVM goc bSng each:

Su dyng bim ldi binh phuong nhd nhat - | | z f ( t h a y v i C | | z | | ) ;

Cyc dgi ll phin hogch bing —Iwi^l (tiiay

^Hr)-

Giai thuat SVM d u o c v i l t lai duoi d?ng (7):

i.M'.6 Z 2. 0)

vdi ring budc D(Aw - eb) + z > e Trong do C > 0 la hing sd cbo phdp dilu chinh muc dp ldi (z > 0) v i dp rdng (ll) cda 2 sidu phing hdtip.

B i n g each tiiay thkz = (e - D(Aw - eb))+ (vdi (x)* thay t h i c i c thanh phan i m c u a vec-ta x bdi gia trj 0) tir rang budc vao h i m m y c t i e u / c i j a (7), ta thu d u p e bai todn tdi u u khdng r i n g budc (8):

mm/=-|||(e-D(Aw-eb)),f-Hi||w,;>f (8)

B i n g each d i t u = [w/ wj ... w„ b]'^ vkH = [A -e], cdng tilde SVM trong (8) d u p e viet lai n h u (9):

min / = - | | ( e - DHu) Jf +-u^u (9)

w,b 2 2 Mangasarian [10] d i d l x u i t giM thugt lap Newton dk giai quyet v i n d e tdi u u khdng rang bupc n h u cua SVM tiong (9). Giai t h u i t d u p e md t i n h u B a n g 1. Mangasarian c u i ^ chiing minh r i n g day cac g i i Uj {u,} ciia g i i i t h u i t l i p Newton hdi ty den nghifm toi u u t o i n cue. T r o n g h a u h i t c i c Uudng hpp k i l m thir Uong t h y c te thi g i ^ thugt l i p Newton hdi ty d i n nghifm Uong k h o i n g tii 5 d i n 8 budc lap. Chu J r i n g , g i i i t h u i t l i p N S V M chi yeu c i u g i i i c i c bf p h u o n g tiinh m y l n tinh (10) c6 (n+l) b i l n (thay v i la bM toan quy hogch toan p h u a n g n h u g i i i t h u i t S V M c h u i n ) . D o do, neu sd chilu dfl lifu n trong b i c nhd h o n 100 thi thgm chi sd p h i n t d dfl lifu m Idn d i n h i n g trifu, giai thu^t lap N S V M cd t h i p h i n l d p chiing tiong v i i giiy tign mpt may tinh c i n h i n .

(4)

I T<^ chi Khoa hgc Trudng Dgi hgc Can Tha II Bang 1: Giai thuit lap NSVM

Phin A: Khoa hgc Ty nhien, Cong nghe vd Mdi Irudng: 32 (2014): 35-41

- Diu vio: t^p dfl lieu huin luyen, AfmxnJ vi DfmxmJ - Bit diu vdi uo s I^*' and i = 0

-Repeat

^)u,.i = u,-^f(u,}-'^(u^ (10) 2)1 = 1+1

Until Vf(u>) = 0 - Return Ui Vdi dgo ham cuaf tgi u/,

W(u.) = C(-DH)^(e - DHu,). + ut (II) V i ma Ugn Hesse, dgo ham timg phin bac 2 c u a / tgi Ut,

^f("i) = C(-DH)^diag(fe - BHu,J*)(-DH) + / (12)

Vdi diag(fe - BHuJ') la raa trin dudng cheo (n+I)x(n+l) mi thanh phin cheo thdy la dao him thanh phan cua him (e - DHu^+

3 GL4.I THUAT ARC-X4-NSVM CHO PHAN L 6 P T . ^ DU" U^V L 6 N

3.1 Giii thuit NSVM cho d& lifu c6 sd chilu Idn nhung it phin td

Trong cic img dung nhu sinh tin hpc hay phan logi vin bin, cic tip du lieu can xu i$ cd sd ehidu n thudng rit Idn (hon 1000) nhung cd sd phin tu m nhd (khoang 50 din 250 phin bl). Vdi nhiit^ t|p dft lifu dgng niy, ma tiin vudng (n+l)x(n+I) Hesse, ^f(u() trong (12) cd kich thudc Idn va vide liy nghjch dio (^f(u^'') hay giii hf phuong tiinh myen ti'nh tiong (10) tid nen qui phdc tgp, mit thdi gian. D I thich iing giai thu^t NSVM cho trudng hgp nay, chung tdi ip dyng djnh ly Sherman- Mouison-Woodbury [7] dd tinh ma tiin nghich dio (^/("i)'') tiong cdng thuc (10) nhu sau. .- .-^^-- .ygr^

Dit Q = diag(sqrt(Cfe - DHuJ')) vk P = Q(- DH), ma tr§n nghjch dao (&f(u,)'^) c6 thi duge vilt lai nhu cdng thtic (13):

^f(u^)-'= (I + P'^P)-' (13) Djnh 1^ Sherman-Morrison-Woodbury dugc cho nhu (14):

(A + UV^y^^A-' -A-'U(I+ V^A-'U)-' V^A-' (14) Tilp din, ap dyng dinh ly Sherman-Morrison- Woodbury (14) vio vd phii ciia (13), chiing ta co dugc ma tign nghich dio (^f(uj-') nhu (15):

^f(uy' =(I + P^Pr'=I-P^(l + PP^r'P (15) Chiing ta cd the thay ring vifc ti'nh ma trgn i ^ j c h dao i^fluj') nhu Uong cdng tiiirc (15) chi

phu thudc vio nghich dio ma ti§n cip (m)x(m) Ii (I + PP^) thay vi la phii nghich dio ma ttin cap (n+l)x(n+l) nhu Uong (10). Vi thi, dp phdc tgp tinh toan ciia NSVM Iiic nay chi phy thudc vio sd lupng cic phan td chd khdng phii sd chieu, Cich bidn ddi Igi bii toin nhu the niy cd the xd ly dugc cic tip dfl lieu co sd chieu rat Idn nhimg it phin tii, 3.2 Giii thuit ARC-x4-NSVM cho du- lifu c6 sd phin tu va sd chilu Idn

Ddi vdi vifc phan ldp cac tip dfl lifu viia cd sd chieu Idn ddng thdi sd lugng phin tu cung rat Idn (tien 10^), vi dy nhu Uong phin loai vin ban, se cd it nhit 2 vin de can giai quyet: thdi gian huin luyfn tang len ti le thuan vdi sd phin td diing dk huin luyfn va bp nhd diing dd luu trfl dfl lieu ciing phii ting ISn.

^™*^DI giii quylt bai toan tiong tmdng hpp dfl lifu cd sd chieu va sd phan td deu Idn, chiing tdi ap dung each tilp cin boosting nhu Adaboost ciia Freund & Schapire [6], ARC-x4 ciia Breiman [3]

Idn NSVM. Vifc lim nay mang Igi 2 Igi ich:

(i) xu 1^ duge du lieu Idn va (ii) cai thidn dp chinh xac.

Giii thuit Adaboost la mft phuang phip tdng quit dl cii tidn dd chinh xac ciia cic giai thuit phan ldp ylu. Giai thuit Igp Igi vide huan luyen b i i ^ cic giii thugt phan ldp ylu / lin, mdi lin uen mdt tgp con dugc liy miu tii tip dfl lieu dimg dd huan luyfn. Lan lip sau tap trung huan luyfn trdn cie phin tu bj phan loai sai tiong lin lip tmdc dd.

Dk lim dupe dieu nay, mdi ph§n td dupe gin mdt tipng sd.

(5)

Tgp chi Khoa hgc Trudng Dgi hgc Cdn Tha Phdn A. Khoa hgc Ty nhiin, Cdng nghi va Mdi trudng. 32 (2014): 35-41 Khdi tgo cac Upng sd niy bang nhau. Sau mdi

budc lip cac phin tu bj phan ldp sai se dupe tang tipng sd len. Sau qua trinh lap, ta cd / bp phin Idp yeu. Vifc phan logi phin tu mdi sd dpng ket qui binh chpn tipng sd tu / bd phan Idp ylu, ARC-x4 cung mong ty nhu Adaboost ngoai trir ARC-x4 tap tmng huin luyen Uen cac phin tu bj phSn loai sai Uong tit ca cie ISn lip trudc do vi vifc phin loai phin hi mdi dya tien ket qui binh chpn sd ddng (khdng cd Upng sd).

Chiing ta cd thi xem NSVM nhu mdt bd phan Idp yeu vi vide huan luyfn trong timg budc lap chi thyc hidn Udn tip con ciia tip du lieu huin luyen.

Va nhir the ta co thi ip dyng Adaboost hay ARC- x4 len NSVM.

Vdi tip dfl lifu cd sd phin tu Idn, sd chilu nhd hon 100, till su dyng NSVM nhu uong (10). Vdi dfl lifu cd sd chidu Idn bay ddng thdi Idn ve sd chieu vi sd phin tu, boosting eia NSVM van cd thi giii quylt dugc nhanh chong vi mic du sd chieu Idn nhung Uong moi lin lap sd lugng phan td dl huin luyen se nhd nen cd thd ap dung (15) de giii.

Chii y ring thdi gian huan luyfn NSVM nhanh Bing 2: Mo ta cic tap dir lieu y sinh hpc

hon rit nhilu so vdi SVM chuin tien cic tap con (it han rit nhilu so vdi toin bp t|p huin luyfn). Vi viy, chiing tdi quan tam boosting cua NSVM hon li SVM chuin.

Chiing tdi di xiy dyng giii thuit ARC-x4- NSVM dk giii quylt bai toin dfl lifu Idn. Theo Breiman cdng bd tiong cdng trinh [3], thyc n ^ f m cho tiiiy ARC-X4 cho kit qui hrang ducmg vdi Adaboost. Tuy nhidn vl mgt tinh toin so vdi Adaboost thi ARC-x4 don giin han nhilu, Diy la Cling Ii ly do chiing tdi chpn ARC-x4 thay vi Adaboost,

4 KET QUA THUC NGHIEM D I tiln hinh dinh gii hieu qui ciia giii thu|it ARC-X4-NSVM cho phin Idp cic tip dfl lieu Idn, chiing tdi di cii dit giii thugt bing ngdn ngit lap trinh C/C-I-+, Ngoii ra, chung tdi ciing cin so sanh hifu qua vl thdi gian va dp ehinh xic phin Idp eia giai thuat dk xuit ARC-x4-NSVM vdi mot giai thuit SVM chuin, dugc sd dung phd biln tiong cdng ddng may hpc li LibSVM [4]. Tit ei cie giii thugt diu dugc thyc hifn tien mpt miy tinh c i nhan (2.4 GHz Pentium IV, 2GB RAM) chgy hf dilu hanh Linux.

Sd ldp So phin tu Sd cfaieu Nghi thdc kilm tra Tgp dft lifu

ALL-AML Leukemia Breast Cancer Ovarian Cancer Lung Cancer Translation Initiation Sites

2 2 2 2 2

72 97 253 181 13375

7129 24481 15154 12533 927

38 tm - 34 tst 78tm-19tst

leave-1-out 32 tm - 149 tst hold-out Cic tip dfl lifu tliyc nghifm dupe liy ve tir

website cua Jinyan & Huiqing [9]. Day li cie tip dfl lifu y sinh hpc, ed sd chieu Idn, bge hing ngan vi sd phin td tu vii chyc den hang chyc n g ^ . Xet t§p dft lifu nhu Ovarian Cancer, vdi 253 phin tir, 15154 chieu. Nlu sfl dyng giai thugt gdc NSVM eiia Mangasarian [10], mdi budc lip phai liy nghjch dao ma trin kich thudc 15155x15155 ((n+l)x(n+I)), dp phuc tgp tuang iing 15155^.

Trong khi giii thuit cii tiln ciia chiing toi dl xuit d mdi budc Igp thyc hifn nghjch dao ma tiin 253x253 (mxm) vdi dp phuc tap ttrong iing la 253^

Giim dp phiic tgp khoing 210000 lin. Thgm chi vdi mdt miy tinh PC, rit khd thyc hifn vifc liy nghjch dao ma tr|n kich thudc 15155x15155, Qua diy cd thi thiy ring giii thugt cii tiln hifu qui so vdi giii thugt gdc trong trudng hgp dfl lifu y sinli hpc.

Bing 2 rad t i cic tip dft lifu thyc nghifm. Cdt cudi cua bang 2 md t i nghi thuc kilm tra ciia cic giii tiiuit. Ba tgp dfl lidu ALL-AML Leukemia, Breast Cancer, Lung Cancer ddu dupe chia thinh tip huan luyfn (tm) vi tip kilm tra (tst). Ddi vdi cic tap dfl lifu niy, chiing ta cin huin luyfn md hiiJi hpc su dung tip huin luyfn, tinh thdi gian huin luyfn v i diing t ^ kilm tra de ti'nh dp chinh xac cho phin Idp. Tap Ovarian Cancer till su dyng nghi thiic leave-1-out, lip lai 253 lin huin luyfn vi kilm tra, mdi lin chi liy 1 phin tii lim tip kilm tra dp ehinh xac va 252 phin td cdn Igi 1 ^ tgp huin luyen dl tinh thdi gian huin luyfn, cudi cimg tinh tiung binh thai gian huin luyfn vi dd chinh xic.

Tip Translation Initiation Sites thi sd dyng nghi thuc hold-out, liy ngau nhidn 2/3 t|ip dfl lieu gdc lim tip huin luyfn dl tinh thdi gian huin luyen md hinh hpc vi 1/3 con Igi lim tip kilm tia dl tinh dd chinh xic khi phin Idp.

(6)

Tgp chi Khoa hgc Trudng Dgi hgc Cdn Tho Phdn A: Khoa hgc Ty nhien, Cong nghe vd Mdi trudng: 32 (2014): 35-41 Bang 3: Kit qua phan Idp dir lifu y sinh hpc

T i p d& lifu ARC-X4-NSVM

ALL-AML Leukemia Breast Cancer Ovarian Cancer Lung Cancer Translation Initiation Sites

ThM giaD huan 9,14 269,66 403,60 20,80 314,00

{>$ chinh xjc ThM gian hu£n phSn Idp (%) luyen (giiy)

97,06 73,68 100 98,66 92,41

5,01 75,43 10,13 3,51 50,27

D9 chinh sac phan lop (%) 97,06 84,21 100 98,00 90,80 Giii tiiuit ARC-X4-NSVM xiy dyng 30 md

hinh NSVM, mdi md binh dugc xay dyng tien mau 30% tip dfl lidu gdc. C i LibSVM vi NSVM ddu sd dung him nhan myln tinh vdi hing sd C = 1000

(dieu cbinh dp rpng Id phin hogch va loi) cho kdt qui tdt nhit. Kit qui thu dupe nhu tiinh biy tiong Bang 3. Hinh 2, 3 cung cip dd thj so sanh vl thdi gian huin luyfn v i dd chinh xic phan ldp.

450-, 400- 350 ^ S 300

§ 250 g 200-1 E 150-

^ 1004 50-!

0 s » — i

BLibSVM

•ARC-X4-NSVM

Hinb 2: So sanh thdi gian huan luyen

BUbSVM

•ARC.X4-NSVM

Hinh 3: So sinh dd cbinh xac pban Idp So sinh kit qui cho thiy dupe giii thugt ARC-

X4-NSVM thi rtiianh bon LibSVM ve thdi gian huin luyfn (td 2 din 40 lin). Dp chinb xac khi

phan Idp ciia ARC-x4-NSVM cd thd xem li tuong duang vdi LibSVM.

(7)

Tgp chi Khoa hgc Trudng Dgi hgc Can Tho Phdn A: Khoa hgc Tu nhien. Cong nghi vd Mdi trudng: 32 (2014): 35M I

5 K E T LUAN V A D E X U A T

Chiing tdi vira trinh biy giai thuit miy hpc mdi ARC-X4-NSVM cho phan Idp tip dft lieu Idn ti^n miy tinh ci nhin. Chung tdi de xuit md rpng giai tiiugt NSVM cua Mangasarian dk xiy dyng giii thuat ARC-X4-NSVM. Chiing tdi da ap dyng djnh ly Sherraan-Mouison-Woodbury dl thich lirig giai thuit NSVM khi phin Idp dfl lieu cd sd chilu Idn nhung sd phin td nhd, thudng gap trong iing dyng sinh tin hpc, phan ldp van ban, DS xuit ap dyng ARC-X4 Ien NSVM cho phep giai tiiuit mdi cd thi phSn Idp nhanh dfl lieu cd ddng thdi sd phin tii va sd chidu Cling Idn tren may tinh ca nhan. Ket qua thyc nghifm tren cac tap dfl lieu y sinh hpc cho thiy ring giii thuit ARC-x4-NSVM hpc nhanh, phin ldp chinh xac khi so sanh vdi giii thuat LibSVM.

Trong tuong lai gin, ching tdi phat triln giai tiiugt song song cho phep ting tdc qua trinh huan luyfn vi phan Idp ciia giai thuat SVM.

TAI LI$U THAM KHAO

1. K. Bermett and C. Campbell. Support vector ma chines: Hype or hallelujah?. SIGKDD Explorations, 2(2): 1-13,2000, 2. B, Boser, I. Guyon, and V. Vapnik. An

Uaining algorithm for optimal margin classifiers. ACM Annua! Workshop on Computational Learning Theoiy, pages 144- 152,1992.

3. L. Breiman. Arcing classifiers. The annals of statistics, 26(3): 801-849, 1998.

4. C-C. Chang and C-J. Lin. Libsvm - a library for support vector machines. 2001-2014.

5. N. Cristianini and J. Shawe-Taylor, An ^^T??"

Introduction to Support Vector Machines and Other Kernel-based Learning Methods.

Cambridge University Press, 2000.

6. Y. Freund and R. Schapire. A decision- theoretic generalization of on-line learning and an application to boosting, in EuroCOLT, 1995, pp, 23-37.

7. G. Golub and C. van Loan. Matrix Computations. The John Hopkins University Press, Baltimore, Maryland, 1996.

8. L Guyon. Web page on svm applications.

1999-2014.

9. I.. Jinyan and L. Huiqing. Kent ridge bio- medical dataset repository. 2002.

10.0. Mangasarian. A finite newton method for classification problems. Data Mining Institute Technical Report 01-11, Computer Sciences Department, University of Wisconsin, 2001,

11. J, Piatt. Fast tiaining of support vector machines using sequential minimal optimization. Advances in Kernel Methods - Support Vector Learning, pages 185-208,1999.

12. F. Poulet and T-N. Do. Mining very large datasets with support vector machine algorithms. Enterprise Information Systems f', pages 177-184, 2004.

13. Liu H. Syed, N. and K. Sung. Incremental , learning with support vector machines.

ACM SIGKDD, 1999.

14. S. Tong and D. Koller. Support vector maohin'^

active learning with applications to text classification. ICML, pages 999-1006,2000.

15. V. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, 1995.

Referensi

Dokumen terkait