• Tidak ada hasil yang ditemukan

An intelligent categorization tool for malay research articles

N/A
N/A
Protected

Academic year: 2024

Membagikan "An intelligent categorization tool for malay research articles"

Copied!
2
0
0

Teks penuh

(1)

AN INTELLIGENT

CATEGORIZATION TOOL FOR MALAY RESEARCH ARTICLES

GRANT NO: RAG0008-TK-2012

Mohd Norhisham Bin Razali, Rayner Alfred and Chin Kim On

IHPUSTAKAAII -.. ,

\IIINERSITI MAI.AYSIA SAlllf4 Final Report

Faculty of Computing and Informatics Universiti .Malaysia Sabah

Malaysia

2015

(2)

AN INTELLIGENT CATEGORIZATION TOOL FOR MALAY RESEARCH ARTICLES

Abstract

Unlabeled research articles published in Malay language are becoming increas­

ingly common and available in Malaysia. Thus, the task of manually indexing these research articles is difficult and time consuming. In order to facilitate research activities that depend on research resources written in l\lalay language, these research articles must be categorized or indexed efficiently so that appro­

priate and relevant domains of knowledge can be recommended to researchers in l\falaysia. There are not many researches conducted to efficiently categorize Malay research articles. The task of categorizing Malay research articles is more complex compared to the task of categorizing English research articles due to the complexity of Malay language and thus categorizing Malay research articles represents a major contemporary challenge. Malay text documents are often represented as high-dimensional and sparse vectors, by using Malay words as features, which consist of a few thousand dimensions and a sparsity of 95 to 99%

is typical. Determining the appropriate number of categories for large amount of Malay documents is also difficult and time consuming task due to the sparsity of the documents. Related documents may be grouped into different clusters, if there are too many number of categories assigned to these documents. On the other hand, unrelated documents may be clustered into the same cluster, if there are too few number of categories assigned to these documents. This research ad­

dresses issues that involve improving several pre-processing processes that affect the performance of the clustering process. These pre-processing processes include stemming, part-of-speech tagging and named-entity recognition. In this work, the effects of improving all these pre-processing processes will be investigated. It is anticipated that by improving the clustering results, it will also improve the mapping of Malay and English clusters obtained from the bilingual clustering.

Hence, by increasing the mapping percentage for the bilingual clusters, a more robust clustering algorithm can be developed for clustering bilingual documents.

As a result, by increasing the mapping percentage for the bilingual clusters, a more robust clustering algorithm can be developed for clustering bilingual documents. In this study, a genetic algorit.hm {GA) is also proposed to be implemented in order to determine the set of terms that can be used in clustering bilingual documents with more effective.

Referensi

Dokumen terkait

The results of the study were: (1) there are three types of sentences as the base of categorizing the data of research, namely: simple sentences, compound sentences,

The motivation to come up with MEDict is due to lack of Malay-English bi-directional mobile dictionary in the market compared to the amount of mobile dictionaries available in

It can be inferred, as based on the results of the analysis, in the management research articles of their theoretical frameworks, passive sentences are still used by the authors.

As a key note of this research article, the contributions would be as follows: i An intelligent task offloading model is proposed in fog-cloud collaboration based on logistic regression,

Thus, this study’s objective is to analyse and offer critiques on Chin’s work based on the three main factors, namely the concept of Malay supremacy, unreadiness of Malays to share

Rhetorical strategies used in Malay research article conclusions ABSTRACT This paper explains a combination of genre-based knowledge and evaluative stance in the context of academic

ABSTRACT Analyses the acknowledgements included in the research articles and short communications published in Journal of Natural Rubber Research 1986-1997 in respect of types,

This study analyzes a set of the Methods section of biomedical engineering research articles written in English with the objective of elucidating what constitutes appropriate academic