Publication Repository

(1)

Self-Learning System With

Natural Question-Guided Search

For Narration In Indonesian Language

Edwin Wibowo Sampurna#1_{, Esther Irawati Setiawan}*2 #_{Global Business, Chinese Culture University}

55, Hwa-Kang Road, Taipei, Taiwan R.O.C 11114

1_{edwin.sampurna@gmail.com} *

Sekolah Tinggi Teknik Surabaya

Ngagel Jaya Tengah 73-77, Surabaya, Indonesia

2

esther.irawati@gmail.com

Abstract The need of self-learning education is quite high recently. Peoples find how to do learning education using the Natural Question-Guided Search algorithm to make a question that can form a question along with answers that can be used to improve the self-learning understanding. Natural Question-Guided Search is an algorithm used to make a question from a narrative sentence.

This Natural Question-Guided Search utilizes a Dependency Parser to process a sentence. Dependency Parser is a syntactical analysis process of a sentence which assume that a word is dependent on other words (head). This relation shows grammatical function between words which cannot be shown by constituency-based parser that can only identify phrase. By knowing grammatical function between those words, it will ease the creation of a question.

A test is done to determine the accuracy of the application of this Natural Question-Guided Search algorithm with applying it to the school reading text book to create a question.

Keywords natural language processing, natural question-guided search, dependency parser, constituency parser, bahasa Indonesia

1. I

NTRODUCTION

Nowadays, the knowledge obtained from the learning process is really important. In addition to reading, comprehension of the text can be reader's understanding of the text.

With the technology that we know is increasing everyday, it is needed a system that can make such a question directly from an existing narrative texts, so that the reader will get a question and can answer any questions after reading the narrative text in order to improve understanding and comprehension of the content of the text. With that way, the reader can study or learn a text easily by themselves.

Natural Question-Guided Search algorithm is require to generate a question. Natural Question-Guided Search is an algorithm developed by Alexander Kotov and ChengXiangZai by using Dependency Parsing to process a sentence and generate a tree as the output. This output is Dependency Tree which further will be processed by Natural Question-Guided Search algorithm to generate a question. Those questions can be use for self-learning education that can help reader to study or learn a text by themselves.

2. F

UNDAMENTAL

2.1. Indonesian Language Pattern

The same with English, Indonesian Language Pattern also consist of phrase, clause and sentence.

Clause is consist of two or more words that make up a construction containing elements of the predicative, and has the potential to be a sentence.

(2)

Sentence is a group of words that are put together to mean something which expresses a complete thought. It does this by following the grammatical rules of syntax. There are four type of Sentence: Simple Sentence, Compound Sentence, Complex Sentence and Complex-Compound Sentence. But on this research, it is focus only on Simple Sentence.

2.2. Natural Language Processing

Natural Language Processing (NLP) that is a part of computer science, Artificial Intelligence (AI), and Linguistics which deals with the interaction between human and computer so that the computer has the ability to be able to understand natural human language.

There are four NLP that used for this research : Part-of-Speech Tagging (POS Tagging), Named Entity Recognition, Constituency Parsing dan Dependency Parsing.

2.2.1. Part-Of-Speech Tagging

Part-of-Speech Tagging (POS Tagging) is the process of determining the words according to the grammar for each of the words in the sentences of natural language. POS Tagging is also can provide information of the word from syntactic or morphology of a sentence.

This research use Indonesian Language POS Tagging (iPOSTagger) that made by Alfan Farizki Wicaksono and Ayu Purwarianti. iPOSTagger is very important in this research, especially in classified and tagging the words on a sentence.

2.2.2. Named Entity Recognition

Named Entity Recognition (NER) is one of the components of information extraction to detect and classify the named-entity in a text. NER is generally used to detect people's names, place names and organization of a document.

This research use LingPipe NER for English.

in this reasearch, so that LingPipe NER can be use. While for detect the other names will be use some addition rules for categorizing.

2.2.3. Constituency Parsing

Constituency Parsing or Phrase Structure Grammar is a parser used in NLP to parse a

Grammar Pattern). Constituency Parser using the grammar rules to generate Constituency Tree of a sentence so that it becomes a model of grammar patterns.

But Constituency Parsing is not developed in this research. This research use Constituency Parsing by Stanford Parser to generate a Constituency Tree. Stanford Parser is an English parser. Since Indonesian Language has the similarities with English in terms of sentence structure, the Stanford Parser can be used.

2.2.4. Dependency Parsing

Dependency Parsing is a parser which generates a grammar that describes the dependence between the components which one is the head and the other is dependent. Head also called modifier as a determinant for the partner.

Dependency Parsing will be done using the method of mapping from Constituent Structure to Dependencies Structure, because the input to this parser is a constituent-based sentence that is output from last process.

2.3. Natural Question-Guided Search

Natural Question-Guided Search is an algorithm that can transform a narrative text into a question. The first idea of development this and faster result. In general, it is conceivable that Natural Question-Guided Search algorithm is very useful to for search an information by generate a good question.

But in this research, Natural Question-Guided Search algorithm not use to generate a question that use to search an information to get a better and faster result, but this algorithm is only use to generate a question with the answer, so it can be use for self-learning study.

There are two early stage should be done before running this algorithm, Pre-processing stage, and the second is Dependency Parsing. Furthermore, Dependency Tree (output from Dependency Parsing) will be processed by Question-Guided Natural Search algorithm.

(3)

Fig. 1 Question Form Phase Processing Diagram

3. Q

UESTION

F

ORM

P

ROCESSING

The process of Question Form Processing is processed through two main stages, Analysis Solution Dependency Parsing and Natural Question-Guided Search.

The first stage of Question Form Processing is Analysis Solution Dependency Parsing. On this stage the system will be mapping a Constituent Tree Structure into Dependency Tree Structure. This stage can be described in a process diagram as illustrated in Fig. 2.

Fig. 2 Analysis Solution Dependency Parsing Process Diagram

Can be seen in Fig. 2, the input process is an Indonesian Language which will be processed by the Pre-processing Module. The results of that module is a Constituency Tree which will be processed by the Dependency Parsing Module. The result of this system is the Dependency Tree.

The first process Analysis Solution Dependency Parsing is Pre-processing that consisting of POS Tagging and Parsing Constituency.

Fig. 3 Example Input and Output POS Tagging

Furthermore, the output of POS Tagging will be processed using Constituency Parsing to produce Constituency Tree.

Fig. 4 Example Input and Output Constituency Parsing

The next process is Dependency Parsing module that mapping the Constituent Structure. There are two phase to process and mapping it into Dependency Tree:

1. Dependency Extraction

This phase requires a Head Rule for each phrase structure. This module uses Head Rule of Collin Head Rule which can be implemented using the following algorithm:

FOR EACH _cfg IN _{cfg_list}

IF NOT _pos(cfg.lhs) THEN

head get_head_pos(cfg.lhs)

FOR EACH _node IN _cfg.rhs

IF _{node <> head} THEN

IF NOT _pos(node) THEN

dependent get_head_pos (node)

ELSE

dependent node

add_relation(head,dependent)

2. Dependency Typing

This phase requires a Dependencies Label. This module uses the label of Stanford dependencies that have been adapted to Indonesian Language.

The result of this process is in the form of a Dependency tree is Dependency Tree with a label that shows the relationship between words.

Input:

Di/IN Indonesia/NNP terdapat/VBT sekitar/CDI 665/CDP bahasa/NN daerah./.

Output:

(ROOT (S

(NP (NNP Di) (NNP Indonesia)) (VP (VBP terdapat)

(NP

(QP (CD sekitar) (CD 665)) (NN bahasa) (NN daerah))) (. .)))

Input:

Di Indonesia terdapat sekitar 665 bahasa daerah.

Output:

Di/IN Indonesia/NNP terdapat/VBT sekitar/CDI 665/CDP bahasa/NN daerah./.

(4)

Fig. 5 Dependency Tree

The second stage of Question Form Processing is Natural Question-Guided Search that will generate a question and the answer.

On this stage the system will do some process such as Answer Searching, Define Question Type, and the last is Question Forming. This stage can be described in a process diagram as illustrated in Fig. 6.

Fig. 6 Natural Question-Guided Search Process Diagram

Can be seen in Fig. 5, the input process Question-Guided Natural Search is a Dependency Tree that is output of the previous stage, Analysis Solution Dependency Parsing.

Natural Question-Guided Search process can be implemented using the following algorithm:

[Answer Search]

FOR EACH _child IN _{root_left_child /}

root_right_child answer get_deepest_child()

[Question Forming]

FOR EACH _word IN _sentence

IF _{word <> answer} THEN

question question_mark + word +

The first process is the process to finding answers by tracing each of the left and right child of the root in the Dependency Tree that obtained from the previous stage. The process will be done after the system got the deepest node of both side. The results of the Answer Searching Process can be seen in Table 1.

TABLE 1

ANSWERSEARCHRESULT

L/R Deepest Node Head Answer

Left Di Indonesia Di

Indonesia

Right sekitar 665 Sekitar 665

The second process is to define the Question Type. The system will categorize each answer that obtained from the first process to get the question type. This categorization process is done by using Named Entity Recognition that using some addition rules to recognize the word category of each answer and get the question type. The results of the Define Question Type can be seen in Table 2.

TABLE 2

DEFINEQUESTIONTYPERESULT

L/R Answer Category Question Type

Left Di

Indonesia

Places Dimana

Right sekitar

665

Numeric Berapa

(5)

TABLE 3

QUESTIONFORMINGRESULT

Question Answer

Dimana terdapat sekitar 665 bahasa

daerah?

Di Indonesia

Berapa di Indonesia terdapat bahasa

daerah?

sekitar 665

From those process will obtained two question resulting from the left side and the right side of the answers. Each question always have an answer to that question.

4. C

ONCLUSIONS

This research obtained some conclusions that will be explained as follows:

1. Constituency Tree can be mapped to be Dependency Tree using Constituency Parser, Dependency Label and Head Rule. 2. Question Type error due to the lack of

rules at the process of categorizing the answer that obtained on Define Question Type process.

3. Dependency Parsing using mapping method relies heavily on the Constituency Tree that generated from Constituency Parser with the input from POS Tagger. 4. Dependency Parser error using mapping

method from phrase structure to dependency structure occurs in two phase: Constituency Parser and CFG. Error on the Constituency Parser phase is because the parser using a parser for English by giving a sentence that have been tagged with Indonesian Language structure as an input. Error on the CFG phase is because the CFG is a Constituency Tree generated from the Constituency Parser.

A

CKNOWLEDGMENT

This research can be used for further research. Here are some suggestions regarding the development of this research.

1. One of the obstacle on this research is the limited of Constituency Parser for Indonesian Language, so that errors happen dominoly from the Pre-process phase until the result of this research. The possible solution is to develop more for Constituency Parser for Indonesian Language start from Indonesian Language

Tree Bank to obtain a good Indonesian CFG rule.

2. The limited resource and paper about Natural Question-Guided Search also one of the obstacle to do this research. Research for this Natural Question-Guided Search algorithm still need to develop more in the future.

3. Head Rule that used for mapping from Phrase Structure into Dependency Structure still need more testing and validation by an Indonesian Language expert. Because Head Rule that used on this research is Collin Head Rule for English.

4. Labeling rules need to be more specific to avoid ambiguitous, not only for POS Tagging labelling process, but also need special handling for passive verbs.

R

EFERENCES

[1] Afif, Irfan, Studi Perbandingan Kinerja Algoritma CYK dan Algoritma Earley pada Pengurai Kalimat Menggunakan Probabilistic Context Free Grammar Bahasa Indonesia Sederhana, Tugas Akhir Sarjana Teknik Informatika Institut Teknologi Bandung. 2011.

[2] Alexander Kotov, ChengXiang Zhai. Towards Natural Question-Guided Search, 2010.

[3] Alfan Farizki Wicaksono dan Ayu Purwarianti. HMM Based Part-of-Speech Tagger for Bahasa Indonesia, Tugas Akhir Sarjana Teknik Informatika Institut Teknologi Bandung, 2010.

[4] Bies, Ann et. Al., Bracketing Guidelines for Treebank II Style Penn Treebank Project. Linguistic Data Consortium, 1995.

[5] Covington, Michael A, Important Additional Notes about Dependency Parsing. The University of Georgia, 2004.

[6] Debusmann, Ralph, An Introduction to Dependency Grammar. Universitat des Saarlandes, 2000.

[7]

Concordance untuk Alkitab Perjanjian Baru Bahasa Indonesia, Sekolah Tinggi Teknik Surabaya, 2008.

[8] Jessica Felani Wijoyo, Natural Language Processing: Lecture 1. Introduction to Natural Language Processing, Sekolah Tinggi Teknik Surabaya, 2011.

(6)

[10] Keraf, Gorys, Tatabahasa Indonesia untuk SMK dan Umum. Penerbit Nusa Indah, 1991.

[11] Robinson, J., Dependency structure and tranformational rules. Language, 2/46. 1970. [12] Sukamto, Rosa Ariani, Penguraian Bahasa Indonesia dengan Menggunakan Pengurai

http://www.cs.columbia.edu/~mcollins/pape rs/heads, 1995.

[14] _______, Stanford Parser,