• Tidak ada hasil yang ditemukan

Ngo Thi Lan vd Dtg Tap chi KHOA HOC & CONG NGHE 132(02)

N/A
N/A
Protected

Academic year: 2024

Membagikan "Ngo Thi Lan vd Dtg Tap chi KHOA HOC & CONG NGHE 132(02)"

Copied!
6
0
0

Teks penuh

(1)

Ngo Thi Lan vd Dtg Tap chi KHOA HOC & CONG NGHE 132(02): 57-62

PHAN TICH CAU HOI TIENG V I | : T TRONG HE THONG HOI DAP DV*A VAO CONG DONG

Ngo Thi Lan'', Tran H6ng Qu3n^ Nguyin Thi Thanh Nhin', LS Thu Trang' 'Trudng Dai hoc Cong nghe Thong tin va Truyen thong - DH Thdi Nguyen

'Trudng Dai hpc Cong Ngh0 - DH Hd Npi TOM TAT

He thong h6i dap cung cap cho nguoi hoi cau tra Idi chinh xac, gan gon cho cau hoi cua ngudi diing dua vao he thing dudi dgng ng6n ngiJ ty- nhiSn. Dk nang cao hi6u qua ciia h^ thing, hi^n tai c4c nghi6n cihi hudng tdi xSy dung hfi thing hoi dap tu dgng dua vko cSc giao tilp cong ding (yahoo!answer, forum, wiki, mang xa h6i...). Trong do, nhiem vu phan tich cau hoi la mgt phin quan trong cua he thong, no thu hiit su chu y Idn cua cac nhi nghien cdu. Trong nghiSn cii'u nay, chung t6i dH xem x^t cac hudng nghiSn cdu trong phan tich cau hoi n6i chung vd cic nghiSn cuu dp dung cho ti6ng Vifit noi riSng. Tir d6 chung toi d6 xuat m6t danh muc loai cau hoi (taxonomy) mdi, phii hgp hon vdi he thong hoi dap dua vao cong dong vk phuong phap ap dyng cay phy thuoc (dependency tree) de nang cao hieu qua phan lop cau hoi.

Tir kh6a: Hi thong hoi dap, hdi ddp dyavdo cgng ddng, cay phu thugc (dependency tree), phdn tich cdu hdi tieng Viit, phdn logi cdu hdi.

DAT VAN DE

H^ thdng hoi dap (QA) la dang dac bi^t cua tim kiem th6ng tin (information retrieval).

Thay vi dua ra mdt t^p hgp ckc tai lieu hoac danh dau cac doan van ban cd lien quan den cau hdi cua ngudi dimg nhu cac cong cu tim kilm hien co (Google, Bing ...) thi he thing QA cd gang tim cau tra Idi diing, gan ggn chinh xac cho cau hoi cua ngudi diing dua vao dudi dang ngdn ngif tu nhi^n [15]. Viec nghien cuu vl he thdng QA tren the gidi da dugc thuc hi^n tir nhiing nam 1999, da cd nhieu hdi nghi, hoi thao dien ra nhu TREC', CLEF^, ACL^,... va dat dugc nhirng thanh cong nhat djnh. Song cac he thong QA trtlyen thong van chua tra Idi hieu qua doi vdi cac dang cau hoi phdc tap nhu "tai sao...?", "lam the nao...?" hoac cac cau hdi neu quan dilm , nh3n xet... Hudng tilp can hien nay la su dung ngudn dd lieu tu cac giao tilp cgng dong (website hdi dap, forum, mang xa hgi...) lam ngudn luc cho he thing.

Mgt h? thing QA truyin thdng hay he thdng hoi dap dua vao cgng dong (Community based - question answering - CQA) deu co 3 thanh phin chinh: xir ly cau hdi, truy hii thdng tin, xij-1;^ cau tra Idi [I]. Xd ly cau hoi

' Tel. 0943 870272. Email: ntlan@iclu edu.vn

gdm 2 cdng viec chinh: trich xuat ra thdng tin truy van va phan loai cau hdi[l]. Trong bai bao nay chung tdi tap trung vao nhiem vy phan lo^i cau hoi cho he thing CQA. Chung tdi de xuat mdt danh muc loai cau hdi mdi (taxonomy) phii hgp han vdi dir lieu cdng dong, va ap dung cay phu thudc (dependency tree) de cai thien ket qua phan Idp cau hdi cho he thing.

Bai bao cd ciu trilc nhu sau: Sau phin dat van de la phan trinh bay ve cac nghi8n cdu len quan den phan tich cau hdi tren the gidi va d Viet Nam, phan ke tiep se trinh bay ve phuong phap phan tich cau hdi dua vao cay phu thugc. Tiep theo la trinh bay ve ket qua thuc nghiem cua chung tdi, cuoi ciing la kit luan va tai lieu tham khao.

TONG QUAN VE PHAN TICH CAU HOI TRONG HE THONG HOI DAP DI/A VAO CONG DONG

Cau triic tdng quan cua h& thong He thdng QA tren thi gidi da dugc nghien cuu nhieu va kha hoan thien. Tai Vi^t nam da cd mgt sd nhdm nghien cuu va dat dugc nhung thanh cong nhit djnh [3] [4] [20].

Trong nghien cdu cua minh, chung tdi xay dung cho mien md, cd gang nang cao hi^u qua ddi vdi cac cau hdi phuc tap, su dung nguon lire tii cac dich vu giao tilp cgng ddng.

57

(2)

Ngo Thi Lan va Dtg Tap chi KHOA HOC & CONG NGHE Chiing tdi xay dung mgt he thong hoi dap dua

vao cdng dong cho tieng Viet (VnCQA system - Vietnamese community based Question answering system) vdi cau true trinh bay trong hinh 1 vdi cac thanh phin chinh nhu sau:

Hinh I; Tdng quan vikiin triic cua he thdng VnCQA (1) Xay dung va xir ly du lieu tu cac djch vu giao tiep cgng ding (forum, wiki, trang web hoi dap...)

(2) Phan tich cSu hoi: Budc phan tich cau hoi tao truy vin cho budc trich chgn tai li$u lien quan va tim ra nhihig thdng tin hu'u.ich cho budc trich xuit cau tra Idi.

(3) Trich xuit cau tra Idi va phan loai, hiln thi cau tra Idi; Budc nay su dung cau truy vin dugc t^o ra d budc phan tich cau hdi dl tim cac cSuihoi tuohg don^. Liy cac cap cau hdi - cau tra Idi eo- caii Jidi tuang ddng vdi cau hdi dua vaOi .xIp hang cac cau hdi tuong ddng va hienthiikltqua. •

Phuong phap pban loai cau hdi T f:^^^

Phan Idai cau h6i la mdt thanh phan quan trgng, inh hudilg true tilp tdi hi?u nang ciia 58

he thdng QA. Cd hai hudni, loai cau hdi la dua tren luat du

fnh de phan tren may. Co mgt so hudng lai br J) ket hi gid'a hudng dua tren luat va - 'enhpcnffl (Huang va cgng sir., 2008[8]; ray v^ c g n g s | 2010[14]; Silva va cgng su., 201 Uy])- IV^

hudng tiep phan loai cau hoi la cd gang gheo (matching) cac cau hoi vdi mot so lu^t tu vill thu cong (Hull, 1999 [6]; Prager va cgng s | 1999 [10]). Tuy nhien theo hudng nay cl phai dinh nghTa qua nhieu tu^t. Li va Roll) (2004) [21 ] da cung cip mdt sd vi du chi ra s^

kho khan khi phan Idp theo hudng nay. Kho CO thi tao ra mgt bg phan Idp thu cong vdi mdt sd lugng gidi han cac luat. Vi vay hau h6t cac nghien ciru hien nay deu chu yeu dua tren hudng hgc may va hudng kit hgp, Zhang va cgng su. [24] su dung may Support Vector Machine (SVM) d& phan loai cac cau h6i vao cac lap. Vdi do chinh xac 90% ph&

Idp thd tren TREC. Tran Hai Dang va cgng su. [4] ciing sir dung SVM cho phan loai cau hoi tieng Viet sd dung cac dac trung: bag-of- word, tii' khoa (key word) do chinh xac dat dugc la 94,1% vdi cac Idp tho va 83,4% vai cac lap min. Chung toi cung thd nghiem phuong phap cua Trin Hai Dann vdi dir li?u cua chung toi, kit qua thu dugc la kh6ng cao,.

Do du lieu cua chung toi thu thap td forum nen co do nhieu cao.

Chiing t6i de xuat phuong phap phan loai trong he thdng VnCQA ca chiing t6i nhu sau:

C cau hoi ^ - ^

'

f ^ ^

PhSn tich cau hoi

^

f

C

Loai cau lioi

Tokenniz

Danh s^ch

Tim tir tirons

y

'

^?^"-

Tu fi/ong dong tCrkhoa

\

Tir kho4

Hinh 2: Klin tr-ucphan logi cdu hdi Td danh sach cac td phan tich dugc trong cau hdi cua ngudi dung, chung tdi sd dyng mo hinh phan Idp SVM d l phan cac cau hdi vl cac loai cau hdi khac nhau, su dung ac dac

(3)

Ngo Thi Lan vd Dtg Tap chi KHOA HOC & CONG NGHE 132(02): 57-62 'ifung la: Bag of words hoac unigram, Bigram,

.:':ap tu (word-Pairs), do tuong ding (Similarity). Y)k nang cao hi6u qua chung toi

^et hgp vdi tim kiem tu tuong dong va trich IKuat cac tu khoa dua vao td diln. Dilm mdi Iciia chung toi la dua ra mot danh muc cau hdi '(taxonomy) mdi phii hgp vdi du lieu cdng dong va ap dung cay phu thugc vao budc tim pciem tu tuong ddng.

iiDANH MUC CAU HOI (TAXONOMY) Taxonomy (hay question ontology) la tap hgp Icac loai (Idp) cau hdi dugc su dyng de phan Idp cac cau hdi. Da nhieu taxonomy dugc de xuit: Li va Roth. [21] da dl xuit mgt taxonomy gdm 6 Idp thd va 50 lop phan lo^i tinh. Mdt taxonomy do Hermjakob va cgng su [19] dua ra gdm 180 Idp. Metzlerva Croft [7]

da de xuat mot taxonomy vdi 2 Idp: danh sach (list) va cau hoi giai thich cho cau hoi co hay khong (yes-no-explain). Co mgt taxonomy dugc ndi den nhieu trong cac nghien cuu phan loai cau hdi la phan loai theo tii hoi nhu cau hdi what, when, where, which, who/whom/whose, why, how [24]. Broder va cgng su. [2] phan loai cac truy vin tim kilm tren web thanh 3 loai: dieu hudng (navigational), thdng tin (information)va chuyen giao (transaction). Rose va cgng su.

[17] su dung 3 Idp: dieu hudng, thong tin va tai nguyen.

Trong he thdng hdi dap dua vao cac dich vy giao tiep cgng dong ciia Zhicheng Zheng va cgng su. [24] da de xuat mgt taxonomy gom 5 lo^i; Cau hoi su kien (for fact), cau hoi cho Ii do (for reason), cau hdi vl giai phap (for solution), cau hdi dinh nghTa (for definition), cau hoi ve tai nguyen (for resource).

Chung tdi phan loai cau hdi dam bao 3 dieu kien:

- Cac loai cau hoi phan anh muc dich ciia ngudi hdi khi hg dat cau hoi.

- Mdi loai trong phan loai cau hdi c6 the co mgt chiln luge tdng hgp phii hgp khac nhau.

- Cac phan loai phai co mgt viing bao tdt trong diJ lieu cgng ddng.

Trong he thdng VnCQA, dya vao quan sat dir lieu va cac dieu kien tren, chung toi de xuat mgt taxonomy gom 3 loai cau hoi nhu sau:

Su kiSn (Fact): Cau hoi ve su ki$n chung va cac thugc tinh cua ddi tugng, cau tra Idi mong dgi cd the tom luge trong doan ngin, danh sach cac doan hay cac cau don gian. Loai nay gim cac loai nho hon: dinh nghTa (Definition), vi du (Example), dinh danh (Identification), danh sach (list), dilu hudng (navigation). Vi dy: "Tim dan man hinh tir tinh la gi ?".

Giai thich (Explanation): Cau hoi dang nay yeu cau phai giai thich hogc neu quan dilm.

Dang nay bao gdm cac dang nhd hon: so sanh (comparison), nguyen nhan (reason) ... Vi dy;

"Vi sao dien thoai cua minh hay bi mit hinh?"

Giai phap (Solufion): Loai cau hoi can tim kiem giai phap cua mgt van 6,h nao do. Cau tra idi mong dgi la hudng din chi tilt cho mgt van de cy thi. Vi du: "Chi cho em each vao facebook tren Iphone".

DEPENDENCY TREE

Ngir phap phu thuoc - cay phu thudc Cu phap phu thugc la ciu true cu phap chira cac myc tu vung ndi vdi nhau bdi cac quan he nhi phan khdng ddi xiimg ggi la su phu thugc.

Quan he phu thudc nay co the -dugc d^t ten de lam ro lien he giiJa hai muc td [13]. Dinh nghTa mgt each hinh thuc: cii phap phu thudc ciia mgt cau cho trudc la mdt do thi dinh hudng vdi gdc root la mgt mit gia, thudng dugc chen vao ben trai cau, cac mit con lai la cac muc tu ciia cau. Vi dy, cau hoi dau vao la:

"hoi each xoa tin nhan tren Iphone", hinh anh cay phu thugc nhu sau:

Hinh 3: Vi du vi cdyphy thudc

(4)

Ng6 Thi Lan vd Dtg Tap chi KHOA HOC & CONG NGHE 132(02) Trong dd, dob: Tan ngu true tiep (direct

object), pob: Bd ngu cho gidi tii (object of preposition), vmod: dgng td (verbal modifier), nmod: danh tu (noun modifier), loc: dia diem (location).

H HHS HE

Hinh 4: vi du cay cdu true phdn cum Cay phu thugc

Cau hoi dau vao

Cay phu thugc

Hinh 5: Xdc dinh tic khod sic dung cdyphu Cau hoi diu vao phai dugc tach tu to (tokenized), sau dd cho qua bg phan tich cu phap phy thugc (Nguyen et al. [19]). Cac phan tich cu phap phu thugc tao ra mdt cay cd chua cac cau true cay, mdi quan he cCia cac tu va mgt s6 thdng tin khac nhu loai. Part of Speech. Cac tinh nang nay sau dd dugc trich xuat tu cay phu thugc va qua mo hinh SVM dl huan luyen. Mo hinh SVM dugc su dyng de phan loai mgt tu la tu khoa hay khong. Vi du ve cay phy thugc:

60

Bangl: Vi du dir liiu cda cay phu thudc

1 hoi hoi \ V 2 cach cach N N

KET Q U A T H C T NGHIEM

Chung tdi xay dung dir lieu de danh gia he thdng gdm: 1013 cau hdi. Cac cau hdi dugc liy tir mgt sd forum nhu vnzoom, tinhte.vn, vatgia.com. Miln dij Ii6u chung toi tap trung vao ITnh vuc cdng nghe. Cac cau hdi dugc liy tu forum vl se loai bo mgt sd cau hoi khong cd nghTa. Phan bd loai cau hdi trong du lieu la: s6 cau hdi vl fact: 33%, sd cau hoi thuoc lo^i solution: 45%, sd cau hoi thugc loai giai thich la 18%.

Enpianallon 18K /

Hinh 6: Phdn bd logi cdu hot trong da lieu Chung toi su dung cac dac trung:

- Part Of Speech (PCS) ciia tu: POSW • - Part Of Speech (POS) ciia td lam niit cha (parent word) cua mgt td: POSP.

-Bag of words: BOW - TU phy thugc phd biin (Common Dependent words): CDW

- Quan he cua td vdi td mit cha (Relation of the word and its parent); RWP

Tat ca cac dac trung d tren dugc trich xuit ra tir cSy phu thugc. Dac trung POS, BOW la dac trung thdng thudng cua xu ly ngdn ngu tu nhien. CDW la mgt djnh danh kilu logic de danh dau niit cha ciia mgt tu ma tu dd cd kha nang cao la khoa. RWP la dac trung dugc trich xuat true tiep td cay phu thugc. Nhom

(5)

Ng6 Thj Lan vd Dtg Tjtp chi KHOA HQC & CONG NGHE 13 2( 0 2) - 5 7- 6 2

d a c t n m g n a y chi ra q u a n h e c u a mgt t u vdi mit cha ciia n o .

Bang 2: Kit qud xdc dinh tie khod sii dung cay phu thugc

I Arcnraty (SVM. „ „ . „ . „ , '

posw+rosp

I « S W + POS Pt BOlViCDW

K E T L U A N

Bai bao nay da trinh bay t o n g quan ve cac nghien ciru trong phan tich cau hoi tren t h i gidi va c a c nghien c d u cho tieng Viet. D o n g gop c u a chiing toi g o m c a c m u c n h u sau:

- Xay d u n g mgt bg ngii- lieu cho cac nghien c d u tiep theo cho t i l n g Viet.

- S u d y n g cay p h y thugc trong phan tich cau hoi tieng Viet la giai p h a p chap nhan d u g c (ket q u a trong b a n g 2).

- De xuat mdt t a x o m a m d i v e cau hdi cho phii hgp vdi he thdng hoi dap d y a vao c g n g dong.

D o chinh x a c cua viec tim k i l m t u khoa phu thugc vao d o chinh x a c ciia p h u o n g phap phan tich cay p h u t h u g c ( 7 1 . 6 6 % (theo N g u y e n va cgng suu [13])).

T r o n g nghi6n c u u tiep t h e o , chiing tdi xem xet viec ket h g p c a c t h o n g tin tai lieu n h u IDF vdi p h u o n g phap s d d u n g cay phy thugc de nang cao hieu qua ciia p h a n l a p cau hoi.

T A I LIEU T H A M K H A O 1. Babak Loni, Asurvey of State-of-the-Art method on question classification. Published on TU Delft Repository, 2011

2. Broder, Andrei. "A taxonomy of web search."

ACM Sigir forum. Vol. 36. No 2. ACM, 2002.

3. Dai Quoc Nguyen, Dat Quoc Nguyen, Son Bao Pham, A Vietnamese Question Answering System, KSE, pp.26-32, 2009 International Conference on Knowledge and Systems Engineering, 2009.

4. Dang Hai Tran, Cuong Xuan Chu, Son Bao Pham, Minh Le Nguyen, Learning Based Approaches for Vietnamese Question Classification Using Keywords Extraction from the Web. In Proc. International Joint Conference on Natural Language Processing, pages 740-746, Nagoya, Japan, 14-18, October 2013

5. Dat Quoc Nguyen,' Dai Quoc Nguyen, Son Bao Pham, Phuong-Thai Nguyen and Minh Le Nguyen.

2014. From Treebank Conversion to Automatic Dependency Parsing for Viemamese. In. Proceedings of 19th International Conference on Application of Natural Language to Information Systems, NLDB'14, Springer-Verlag LNCS.

6. David A Hull. Xerox TREC-8 question answering track report In In Voorhees and Harman, 1999.

7. Donald Metzler and W. Bruce Croft. Analysis of statistical question classification for fact-based questions. Inf Refr., 8:481-504, May 2005.

8. Huang, Zhiheng, Marcus Thint, and Zengchang Qin. "Question classification using head words and their h y p e m y m s " Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2008.

9. Jo~ao Silva, Lu'isa Coheur, Ana Mendes, and Andreas Wichert. From symbolic to sub-symbolic information in question classification. Artificial Intelligence Review, 35(2):137-154, Februaiy 2011.

10. John Prager, Dragomir Radev, Eric Brown, and Anni Coden The use of predic- tive annotation for question answering in trecS In In NIST Special Publica- tion 500-246:The Eighth Text REtrieval Conference (TREC 8, pages 3 9 9 - 411. NIST, 1999.

11. Luhn, Hans Peter. "A statistical approach to mechanized encoding and searching of literary information" IBM Journal of research and development, 1.4 (1957): 309-317

12. Moldovan, D , S Harabagiu, M. Pasca, R.

Mihalcea, R. Girju, R. Goodrum, and V. Rus.

2000. The Stmcture and Performance of an Open- Domain Question Answering System, In Proceedings of the Conference of the Association for Computational Linguistics (ACL-2000), 563-570.

13. Nguyin Le Minh, Hoang Thi Diep, Tran Manh Ke, Nghien cicu ludt hieu chinh kit qud dung phucmg phdp MST phdn tich cu phdp phu thuoc

tieng Viet,

14. Nivre, Joakim. "Dependency grammar and dependency parsing." MSI report 5133.1959 (2005): 1-32.

15 Poonam Gupta, Vishal Gupta, A Survey of Text Question Answering Techniques, International Journal of Computer Applications (0975 - 8887) Volume 5 3 - N o . 4 , September 2012

16. Robertson, Stephen E , and K. Sparck Jones.

"Relevance weighting of search terms." Journal of the American Society for Information science 27 3 (1976):

(6)

Ng6 Thi Lan vd Dtg Tap chi KHOA HOC & CONG NGHE 132(02): 5 7 - 6 2

17. Rose, Daniel E., and Danny Levinson.

Understanding user goals in web search.

Proceedings of the 13th international conference on World Wide Web. ACM, 2004.

18. Santosh Kumar Ray, Shailendra Singh, and B P.Joshi. Asementic approach for question classification using wordnet and wikipedia.

Pattern Recogn. Lett., 31:1935-1943, 10/2010 19. Ulf Hermjakob, Eduard Hovy, and Chin yew Lin, Automated question answering in webclopedia - a demonstration. In In Proceedings ofACL-02,2002.

20. Vu Mai Tran, Vinh Due Nguyen, Oanh Thi Tran, Uyen Thu Thi Pham, Thuy Quang Ha, An Experimental Study of Vietnamese Question Answering System In Proceedings of IALP'2009.

pp. 152-155

21. Xin Li and Dan Roth, Learning Question Classifiers. In Proc. The International Conference on Computational Linguistics, 2002

22. Zhang, Dell, and Wee Sun Lee. "Question classification using support vector machines."

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval. ACM, 2003.

23. Zhang, G„ W. Zhang, Y. Bai, S. Kang and P.

Wang "An Open-domain Question Answering System for NTCIR-S C-C Task", Proceedings of NTCIR- 8 Workshop Meeting, Tokyo, Japan.

24 Zhicheng Zheng, Yang Tang, Chong Long, Fan Bu, Xiaoyan Zhu, Question Answering System Based on Community QA, Workshop Programme Committee, 2012.

S U M M A R Y

I M P R O V I N G T H E F O C L T O L E A R N R E C U R S I V E T H E O R I E S

Ngo Thi Lan'', Tran Hong Q u a n \ Nguyen Thi Thanh Nhan', Le Thu Trang' 'College of Information and Communication Technology - TNU

^College of Technology - Vietnam National University, Hanoi Question Answering System (QAS) provides the accurary and succinct answers for users whose questions are put into system by form using natural language. To improve the efficiency of the system, the present research aims to develop automatic QAS based on social communication such as yahoo! Answer, forums, wikis, social network,... In particular, task of questions annalysis is an important part of the system that attracted great attention from researchers. In this paper, we present an overview of the research in question analysis and potential application for Vietnamese.

We also propose a new question taxonomy suits community-based question answering system and dependency tree method to improve the efficiency of question classification.

Key words: Question answering, community based-question answering, dependence tree, question classification

Ngdy nhdn bdi: 16/10/2014; Ngayphdn bien:04/! 1/2014, Ngdy duyet ddng: 05/3/2015

Pltdn bien khoa hoc: TS. Nguyin Hdi Minh - Trudng Dgi hoc Cong nghi Thdng tin <& Truyin thong ~ DHTN Tel. 0943 870272. Email: [email protected]

62

Referensi

Dokumen terkait