SCIENCETECHNOLOGYl
RUT GON THUOC TfNH TRONG BANG QUYET D!NH THEO TIEP CAN TAP THO M d
ATTRIBUTE REDUCTION APPROACH BASED ON FUZZY ROUGH SET
N g u y i n Van Thien, Nguyen Long Giang, N g u y i n NliU son
TdmUit
Lythuyettapthd duoc xem I^c6ngcuhuti hieu d^ giai quyet bai toan rut gon thuoc tinh trong bang quyet dinh vdi miln gicltn thuoc tinh la rdi rac. Tuy nhien, vdi cac bang quyet djnh CO mien gia tri thuoc tinh lien tuc hoac gi j trj ngu'nghTa, cac phuong phdp nit gpn thuoc tinh theo tiep can ly thuyet tap tho to ra khong ht@u qui vl khSng blo toan day dO noi dung thong tin.Trong bai bao nay, chiing toi de xuat phuong phap nJt gon thugc tinh trang bSng quylt dinh theo tiep can tap tho met, nghTa la ly thuyet tap tho ket hgp vdi 1]/ thuylt tSp mil. Vi du minh hoa chiing mmh riing phuong phap de xuat hieu qua hon phiftfng phap siJT dung ly thuylt tSp thd truyen thong.
Tirkh6a:T|p thfi, t^p mis, thp thd md, bSng quylt djnh, rut gpn thudc tinh, tap nit gon.
Abstract
Rough set theory is an effective tool for attribute reduction and rule extraction in decision tables with discretization value domain. In fact, the values of attributes may be both semantia values and real- valued. Rough set based attribute reduction methods are not effective on the attribute values since infonnation content can not be retained. This paper presents a novel attribute reduction approach based on fuzzy rough set, an integrated use of fuzzy and raugh set theories. An example show that fuzzy-rough reduction Is more powerful than the conventional rough set-based approach.
Keywords: Rough set, fuzzy set, fuzzy raugh set, decision table, attribute reduction, reduct
ThS. Nguyen Vin Thi^n • Tnrdng fiai hoc Cdng nghiep Ha Noi TS. Nguyen Long Giang, TS. Nguyin Nhu Scm Vi^n Cong ngh^ thdng tin, Viln Hin lam KH&CN Viet Nam Email: nvthienl970@gmail com; nlgiang^iotUcvn, nnson@ioiLacvn
Ngay nhan bai: 20/6/2014 Ngiydiap nhan dang: 20/8/2014
i.MdoAu
Rut gon thuoc tinh la bai toan quan trpng trong bu'dc tien xCf ly so lieu vdi muc ti&u la gieim so chieu dGr lieu (so thu6c tinh) n h i m tang tinh hieu qua cua cic thuSt t o i n khat pha dQ lieu va hpc mSy. Lythuyet tap tho do Pawlak de x u l t [5] dupc xem la cong cu hieu qua de g i i i quy^t b^i t o i n rut gpn thuoc tinh. Cic phaong p h i p rut gpn thupc tinh theo ti^p c i n ly thuyet tap tho deu
thyc hien tren cac bang quyet djnh c6 mien gia tri rdi rac. Trong thiic te, mien gia tri thupc tinh cua bang quyet dinh thudng chu'a cac gia trj lien tuc hoac gia tri ngQ nghla. Oe giai quyet van de nay, ly thuyet tap tho sQ dung cac phuang p h i p rdi rac hoa dU lieu trUdc khi thuc hien cac phUdng phap rut gpn thupc tinh. Tuy nhien, mUc dp phu thudc cua cac gia trj rdi rac khong duac xem xet. Vi du, hai gia tri thupc tinh ban dau duac chuyen doi thanh cung mot gia tri "Ti'ch
cUc", nhung chung ta khong biet g i i tri nao ti'ch cUc hdn gia trj nao, nghla la cac phuong phap rdi rac hoa khdng giai quyet duac bai toan bao toan ngQ nghla dQlieu.De giai quyet van de nay, D. Dubois va cac cpng su de xuat md hinh tap thd md (fuzzy rough set) [3]
ket hap giQa ly thuyet tap thd [5] va ly thuyet tap md [4]. Ly thuyet tap m d dong vai trd bao taan ngQ nghla CLia dQ lieu, cdn ly thuyet tap thd b i o toan tinh khong phan biet dupc cCia dQ lieu.
Tuang t u nhu md hinh tap thd truyen thdng, md hinh tap thd md sQ dung quan he tuang tU md (fuzzy similarity relation) de xap xi cac tap md thanh cac tap xap xi dudi va xap xi tren [2]. Cho den nay, nhieu cdng trinh da cdng bo cac he tien de, cac tinh chat cua cac toan t d trong md hinh tap tho md[2,3, 8,9]. Tuy nhien, cac ket qua nghien cdu ve rut gpn thupc tinh sQdung mo hinh tap thd m d va cac Ung dung thuc tien cdn han che.
Dda tren phuang p h i p rut gpn thudc tinh sd dung do phu thupc giQa cac thupc tinh trong ly thuyet tap thd truyen thong [6], trong bai bao nay chimg tdi de xuat phuong phap rut gpn thupc tinh trong bang quyet dinh sd dung ham thudc trong md hinh tap thd md. Vi du mmh hoa chUng mmh rang phuong p h i p dUa tren tap thd md hieu qua hon phuong phap dua tren ly thuyet tap tho truyen thdng. Bai b i o
S o 2 3 . 2 0 1 4 - T a p dif KHOA HOC&CONG NGHEJ;
SBCONGNGHE
trinh bay mpt sd khai niem ca ban va phuang phap rut gpn thupc tinh sQ dung dp phu thupc giiJa cac thudc tinh trong ly thuyet tap thd truyen thong; mpt so khai niem co ban trong md hinh tap tho md va phuong phap rut gon thupc tinh sCfdung ham thudc.Tren CO sddd, bai bao dua ra hddng phat trien trong thdi gian tiep theo.
II. R O T G O N THUQC TI'NH THEO TIEP CAN TAP THO T R U Y E N T H 6 N G
Phan nay trinh bay mpt so khii niem ca ban trong ly thuyet tap thd va phuong phap riit gon thupc tinh sQ dung dp phu thudc giQa cac thudc tinh [5,6].
He thdng tin la mot cap IS=(U,A) trong do U la tap hQu han, khac rong cac doi tUpng; A la tap hQu han, khac rong cac thudc tinh. Moi thupc tinh a e A xac djnh mdt anh xa: a : U^-V^ vdi V^ la tap gia tri cua thudc tinh a. Mdi tap thupc tinh P £ A xac djnh mpt quan he tuong duang:
I N D ( P ) = { ( u , v ) e U x U | V a e P , a ( u ) - a { v ) } Ky hieu phan hoach cua U sinh bdi quan he IND(P) la U/P, khi dd:
U/P = ® { a E P : U / I N D ( { a } ) }
vdl A ® B - { X n Y : V X e A , V Y e B , X n Y 5 i 0 } Neu (x,y)elND(P)thix vay khdng phan bietduoc nhau bdi cac thudc tinh trong P. Ky hieu Idp tUong dUdng chQa doi tUong u l a [ u ] ^ , k h i d d [ u ] p = { v e U | ( u , v ) e l N D ( P ) } . V d i B c A v a X c U cac tap B X - | i s U | [ u ] g Q x } va B X - • { j e U | [ u ] g r ^ X ^ 0 } tuong Ung goi la B-xap xi dudi, B-xap xi tren cua X.
Xet he thdng tin IS ^ (U,A) vdi P . Q Q A , ta goi tap POSp(Q) = I J (px) la P-mien duang cua Q. D§ thay, POS,(Q}
la tap cac doi tUpng trong U duac phan Idp diing vao cac Idp cua U/Q sd dung tap thudc tinh P Vdi P,Q e A, dai luong k ^ Vp(Q) bieu dien dd phu thupc cua Q vao P, ky hieu la P =>,, Q, duac dmh nghia:
-\m
Pos^CQJ (1)vdi la luc ladng cua tap S. Neu k =1 thi Q phu thupc hoan toan vao P. Neu 0 < k < 1, Q phu thuoc bo phan vao P. Vdi P c A, X c U, ham thanh vien cua doi tUdng x E U dUdc djnh
U ^ U ^ [ 0 , , ] v a u ^ ( x ) = ^ e,
|[4I
Ham thanh vien dac trung cho do bao ham cua [x]p trong tap doi tupng X, Vdi dinh nghia ham thanh vien, cdng thdc (1) ti'nh do phu thupc cua thupc tinh dupc bieu dien nhU sau:
k=yp{Q)=- , , ' (3)
2 0 TapchiKHOAHOC&CONGNGHE. S o 2 3 . 2 0 1 4
Bang quyet djnh la mdt dang dac biet ciJa he thdng tin, trong do tap thuoc tinh A bao gom hai tap con tach bi^t nhau: tap cac thuoc tinh dieu kien C va tap cac thudc tinh quyet ^ n h D. NhU vay, bang quyet djnh la mpt he thong tin DS ^ (U, C u D ) vdi C n D ^ 0 . Bang quyet djnh DS duac gpi la nhat quan khi va chi khi POS^{D) = U, ngUOc lai DS la khong nhat quan.
Rut gon thudc tinh la qua trinh Ida chpn tap con nho nhat cua tap thupc tinh dieu kien ma bao toan thdng tin phin ldp cua bang quyet djnh, gpi la tap rut gon (reduct). Theo tiep can tap tho truyen thong, Paw/Iak [6] dda ra khai niem tap rut gpn dua tren mien dUong va xay ddng thuat toan heuristic tim mpt tap rut gon tot nhat cua bang quyet djnh dUa tren tieu chi danh gia la dp quan trpng cua thudc tinh.
Djnh nghla 1 . [6] Cho bang quyet djnh DS = (U,CuD)va tap thupc tinh R c C. Neu
1} POS^(D)^POS^{D) 2) VreR.POS ,,(D)^POS (D)
R-{r[ C
thi R la mpt tap rut gpn cua C dda tren mien dUdng.
Djnh nghia 2. Cho bang quyet djnh DS = (U,CuD), B c C va b e C - B. Od quan trpng cua thupc tinh b doi vdi tap thupetinh B dupc dinh nghla bdi:
'•"',(''>\,J°)-y,i°>-
M-Hi°i
gia thiet P05 r o U o . IM
De thay rSng POS^^^i,j(D)>|POSg(D;^ nen SIG3(b)£0. SIGJb) dUdc tinh bdi luang thay doi dp phu thuoc cCia D vao B khi them thu6c tinh b vao B va SIGJb) cang Idn thi lUpng thay doi cang ldn, hay thupc tinh b cang quan trpng va ngUdc lai. Dp quan trpng cua thuoc tinh nay la tieu chuan lUa chon thuoc tinh trong thuat toan heuristic tim tap rut gpn cua bang quyel djnh.
y tddng ciia thuat toan heuristic tim mpt tap rut gpn tot nhat la xuat phit tQ tap rong R = 0 , lan iudt bo sung them vao R ck thupc tinh cd dp quan trpng Idn nhat cho den khi tim dUpctap rut gpn.
Thuat toan 1 . Thuat t o i n heuristic tim mpt tap rut gon tot nhat sif dung dp phu thudc giQa cac thuoc tinh.
Bau vao: Bing quyet djnh DS = (U,C u D) Dau ra: Mpt tap rut gon tdt nhat R.
1. R ^ 0 ;
2. While yp(D)^ 7^(13) do
ForceC -RtinhSIG^(c) = y^^ ( D ) - Y ^ ( D ) ;
SCIENCETECHNOLOGYl
Chonc eC-RsaochoSIG (c V Max 4lG ( c ) | ; R<-Ru|:^};
dien su phu thupc giijfa cic thudc tinh duoc dmh nghla:
6.
7. End;
' 8. Return R;
III. R O T G O N THUdC l l N H THEO T I I P CAN TAP THO M d Phan nay trinh bay phuong phap tiep can tap tho md [3]
I de g i i i quyet bai t o i n rut gon thudc tinh tren cic b i n g quyet i dinh md, la cac b i n g quyet djnh ma mien g i i trj cua thupc ' tinh la cic gia trj md.
Md hinh t i p thd md (fuzzy rough set) dupc xiy dung dua I tren viec ket hop gida ly thuyet tap thd va ly thuyet tap md 1 nhSm xap xi cic tap md (fuzzy set) sd dung quan he tUdng t u md (fuzzy similarity relation) [3, 8,9]. Mot quan he tdong t u md S tren khong gian ddi tupng U thda man cic tinh chat:
tinh p h i n xa (M5{X,X)=1), tinh doi xdng {\ij,'>i,y)=\i^{y,x)) va tinh b^c c l u (\i^ix,z) 2 M5(x,y) A \i^(y.z)]. Tuang t u trong ly thuyet tap thd truyen thong, dda tren quan he tuang t u md, mdi tap thupc tinh P e A xac djnh mpt phan hoach md nhu sau:
U/P = ® ^ e P : U / I N D ( { a } ) } (5) Moi phan t d thudc U/P la mot Idp tuang duang md
(fuzzy equivalence class} [x]p vdi M, , (y)=Mp('''y)- Vi du, n^u P={a,bl, U/IND({a})={N,, Z) va U/IND({b})-{N,^ ZJ, khi do U / P = | j ^ n N j ^ , N ^ n Z j ^ , Z ^ n N j ^ , Z ^ n Z ^ | . Ham thanh vien cua cic doi tupng trong Idp tuong duang md dupc dinh nghla dUa tren ly thuyet tap md:
Cf^^ ,„f^ ( x ) = "il"(>'p (x).U,_ ( x ) . - . ^ ( x ) ] ,6, Dua vao cic Idp tuong dUong md, khii niem tap xap xi dudi v^ x i p xl tren dUdc md rdng thanh tap xap xi dUdi md (fuzzy low/er approximation) va xap xi tren md (fuzzy upper approximation). Vdi tap thupc tinh P c A , ham thanh vien ciia cic ddi tupng thudc tap xap xi dudi md va tap xap xi tren md dupc dinh nghJa [8,9]:
^i ( x ) = s u p m i n j ^ i (x),infmax-fi-Mp{y),M^(y)}
- FeU/P V y^lJ ^ V (7)
M p ^ { x ) - s j j p m i n M^(x),supmin{i^(y).M^(y)}l ^^^
vdi ky hieu inf X, sup X tuang Ung la can dudi dung va can tr^n dung cua t i p hpp X. F la cac ldp tuang duang md cua phan hoach md U/P. Bp /PX,PX\ duac gpi la mot tap thd md.
Trong ly thuyet tap thd truyen thong, khii niem mien dUdng dupc djnh nghla la giao ciia tat ca cac tap xap xi dudi. Vdi P,Q Q A, ham thanh vien cua doi tupng thudc mien duong md trong md hinh tap thd md dUpc dinh nghia:
u ( x ) = sup u ( x )
Dua tren khii niem mien duong md, ham thudc bieu yp(Q)=
'^P05p(Q)
H_z^
POSp(Q) ( X ) (10)TUdng t u nhu thuat t o i n nit gpn thudc tmh sU dung dp phu thupc giQa cic thupc tinh (Thuat toan 1), thuat toan rut gpn thupc tinh trong bang quyet dinh md sd dung ham thupc d cong thdc (10) dupc md ta nhu sau:
Thuat toan 2. Thuat t a i n heuristic tim mdt tap rut gon tot nhat sQdung ham thudc.
Dau vao: Bang quyet dinh DS ^ (U,C u D ) Dau ra: Mpt tap rut gpn tot nhat . 2. R e p e a t
3. T ^ R ; 4. y ,g„ ^ y^^^
5. F o r c E C - R d o 6. Ifv , . ( D ) > y ' (D)then 7. T < - R u { c } ;
9. R—T;
10. U n t i l Tpjev^l'best 11 Return R;
Vi du 1. Cho bang quyet djnh DS - (U,C u { d } ) duac md t i d Bang 1.
Bang] Bang quyet dmh mo taVi dui fidjtiAmg
1 2 3 4 5 6
a -0,4 -0,4 -O.J 0.3 0.2 02
b -0,3 0.2 -0.4 -0.3 -0.3 0
c -0.5 -0.1 -0.3 0 0 0
d No Yes No Yes YM No Cac g i a t n c u a t h u d c t i n h a, b va c d u a c b i e u d i e n b d i h a i t a p m d N va Z n h U h i n h 1.
Fi\
•0.5 0 0.5Hinh 1 Ham thanh vien cua tap mo bieu dien gia tn cac thuoc tinh (ua b^ng quyet dinh
N h u vay, b a n g q u y e t d i n h d V i d u 1 k e t h p p v d i h a m t h a n h v i e n d u a c b i e u d i e n b d i h i n h 1 c h o t a b i n g q u y e t d m h m d DS = ( u , C w { d } ) v d i t a p t h u d c t i n h A = { a l , 8 ^ ( b l , C=(c},
S o 2 3 . 2 0 1 4 . Tap d l i KHOA HOC &CONC '•
EEHEESCONGNGHJ^
Q={q}. Cac Idp tuong duong md sinh bdi cac tap thudc tinh A, B, C tuong Ung la: U/A={N^, Z). U/B={Ny ZJ, U/C={N^ Z;i, U/Q={{1,3,6}, {2,4,5}}. Ap dung cac bUdc ciiaThuat toan 2 tim tap rut gpn ciia bang quyet dmh da cho, bUdc t h d nhat ta phai tinh cac tap xap xi dudi ddi vdi cac thupc tfnh A, B va C.
Xet thupc tinh A, vdi Idp tuong dupng X = {1,3,6},^ { x } duoc tinh:
Xet Idp tuang duong md N^ tren thupc tfnh A:
Ddi tupng 2 dupc tinh:
min(0.8,inf {1,0.2,1,1,1,1})= 0.2 TUOng t u ddi vdi
min(o.2,inf{l,0.8,l,0.6,0.4,l})-0.2
Vi vay ^^ (2)=0.2. Tinh A-xap xi dUdi ciia X = (1,3,6) ddi vdi cac doi tUong khac bang each tUong tU ta '°- ^ - . c ^ i O ) ' " - ^ - ^ i , , „ | ( 2 ) = ' ' • 2 . ^ , , 3 , . | ( ^ ) = ° - ' ' -
^(.,«l('')=°-''' ^|,,3,a|(^)=°-''' fM,,„l(«')=°-''- '<'^' * • mien duong md doi vdi cic doi tUong dupc tinh bdi cdng thQc:
SLf dung tap thd truyen thong sau khi ap dung cic phuiang phap rdi rac hda d d lieu.
IV. KET LUAN
Mo hinh tap tho md do D. Dubois va cic cpng Slide xua't la SU ket hdp giQa ly thuyet tap tho va ly thuyet tap md: Ly thuyet tap thd bao toan tinh khdng phan biet dupc ciia dO lieu cdn l y t h u y e t t a p m d bao toan bao toan ngd nghTa (tinh md) cua d d lieu. Do dd, cdng cu tap thd md dupc danh gij la hieu qua hdn cdng cu tap tho trong cac bai t o i n rut gpn thudc tinh va trich Ipc luat tren cac he thdng tin c6 mien gia tri thudc tfnh lien tuc hoac g i i trj ngQ nghla, gia tri md.Trong bai bao nay, dUa tren phUdng phap rut gpn thupc tinh sti dung dp phu thudc gitjfa cac thupc tfnh trong 1^ thuyet tap thd, chiing toi xay dUng phuang phap rut gpn thudc tinh trong bang quyet dinh md sd dung ham thupc trong mo hinh tap thd md. Vi du minh hpa chdng minh rang phuong phap tiep can tap thd md hieu q u i hdn phUdng phip tiep can ly thuyet tap thd truyen thong. Dinh hUdng nghien ciiu tiep theo la xay dUng cic phuong phap hieu q u i nit gon thudc tinh trong bang quyet dmh m d t h e o hudng tiep can tap thd m d va t h d nghiem phUdng phap vdi cic bai toan thuc te.
Phjn bten khoa hoc:GS.TS.VufllitThl
Ta thu duoc v^ , ,(1)^0,2,
P O S ^ ( Q ) ^ '
Pos^(O) (3)= 0.4,
'••^POS^(Q) (4) =0.4,
H , ,(2)=0.2 H , (5)= 0.4
P O S ^ ( Q ) ' ' •'
(6)-0.4. TUdd, ham thudc cua Q tren A dupc tinh:
.(°)=
-fxeU^'^POS ( Q ) ^ • '|u|
2 / 6Tinh tuong tuddi vdl thudctinhBvaC,tacd:yg(Q) = 2 . 4 / 6 1 ^iS^^^^-CiI^. Thupc tinh b cd ham thupc Idn nhat dUpc chpn dua vao tap rut gpn R va R-{b}.
ThUc hien vdng lap tiep theo vdi hai thudc tinh, ta co
chpn dua vao tap rut gon va ta cd R={a,b}. Cudi cung, dUa vao thudc tinh c va tinh y ( Q ) ^ 3 . 4 / 6 , ket qua nay khdng lam thay doi ham thuoc. Do vay, thuat toan ddng va R={a,b}
la tap rut gon tdt nhat cua b i n g quyet dinh. Khi thdc hien Thuat toan 1 tren b i n g quyet dinh da cho sau khi ap dung phuong phap rdi rac hoa dd lieu sif dung tap md neu tren, tap rut gpn thu duoc la R={a,b,c} [7]. TU dd cho ta ket luan r^ng phuang phap rut gon thuoc tinh theo tiep can tap thd md thu duac tap riit gon tdi thieu han so vdi phuang phap
TAI UEU THAM KHAO
[1] Nguyin Long Giang, Nghien ciiXi motsophitang phap khai phii dS lieu theo tiep can iy thuyet tap tho, Luan an Tien si Toan hoc, Vien Cong nghe thong tin, 2012.
[2] D. Dubois and H. Prade, Putting rough sets and fuzzy sets togethtf, Intelligent Decision Support, Kluwer Academic Publishers,Dordrecht, 1992.
[3] Dubois, D. and Prade, H., Rough fuzzy sets and fuzzy rough sets, International Journal of General Systems, 17, pp. 191-209,1990.
[4] L A. Zadeh, Fuzzy sets. Information and Control, 8:338-353,1965.
[5] Pawlak I., Rough sets. International Journal of Computerand Infonnatioo Sciences,ll(5).341-356,1982.
[6] Pawlak Z., Rough sets: Theoretical Aspects of Reasoning About DaU Kluwer Aca-demic Publishers, 1991.
[7| R, Jensen and Q Shen., Fuzzy-Rough Sets for Descriptive Dimensionalilj Reduction, Proceedings of the 11th international Conference on Fuzzy SystfllR pp. 29-34,2002.
[8] I'an, Y.Y., On combining raugh and fuzzy sets. Proceedings of the CSC5S Workshop on Rough Sets and Database Mining, Lin, T.Y. (Ed.}, San Jose Stan University, 9 pages, 1995,
[9] Y.Y. Yao, A Comparative Study of fuzzy Sets and Rough Sets, Vol.109,21-*', Information Sciences. 1998
221 TapdiiKHOAHOC&CONGNGHE. S o 2 3 . 2 0 1 4