• Tidak ada hasil yang ditemukan

Language Modelling for a Low-Resource Language in Sarawak, Malaysia

N/A
N/A
Protected

Academic year: 2024

Membagikan "Language Modelling for a Low-Resource Language in Sarawak, Malaysia"

Copied!
2
0
0

Teks penuh

(1)

Advances in Electronics Engineering pp 147-158| Cite as

Language Modelling for a Low-Resource Language in Sarawak, Malaysia

Authors

Authors and affiliations

Sarah Samson Juan

Muhamad Fikri Che Ismail

Hamimah Ujir

Irwandi Hipiny

o o

o

o

(2)

o

1. 1.

2. 2.

Conference paper

First Online: 17 December 2019

Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 619)

Abstract

This paper explores state-of-the-art techniques for creating language models in low- resource setting. It is known that building a good statistical language model requires a large amount of data. Therefore, models that are trained on low-resource language suffer from poor performances. We conducted a study on current language modelling techniques such as n-gram and recurrent neural network (RNN) to observe their

outcomes on data from a language in Sarawak, Malaysia. The target language is Iban, a widely spoken language in this region. We have collected news data form an online source to build an Iban text corpus. After normalising the data, we trained trigram and RNN language models and tested on automatic speech recognition data. Based on our results, we observed that the RNN language models did not significantly outperform the trigram language models. A slight improvement on RNN model is seen after the size of the training data was increased. We have also experimented on merging n-gram and RNN language models and we obtained 32.33% improvement after using a trigram- RNN language model.

Keywords

Low-resource language n-gram language model Recurrent neural network language model

Referensi

Dokumen terkait

A study on the presence and relative abundance of benthic harmful algal bloom (BHAB) forming dinoflagellate species was carried out in the coral reefs of Sampadi Island,

The above findings may suggest that the lack of effective public education on organ donation may be the main cause of low organ donation rate and organ shortage in Malaysia; and

The effects of drama-based activities as a language learning tool on learners’ motivation in Non-Malay-medium national schools in Malaysia ABSTRACT Transitioning from

Modelling on Faculty of Engineering FKJ students readiness in implementation of parking charging system at Universiti Malaysia Sabah UMS ABSTRACT Transportation mode choice by

E-learning readiness among English language teachers in Sabah, Malaysia: A pilot study ABSTRACT This paper aims to assess the reliability of the instrument adapted to identify the

2013 IEEE Conference on Open Systems ICOS, December 2 - 4, 2013, Sarawak, Malaysia Probabilistic neural network based olfactory classification for household burning in early fire

Rashid3 Nadeeya ‘Ayn Umaisara Mohamad Nor4 Faizul Helmi Addnan5 Anuar Sani6 Nizam Baharom7 1Universiti Sains Islam Malaysia, Negeri Sembilan, Malaysia E-mail: [email protected]

Effects of Enrichment Planting on the Soil Physicochemical Properties at Reforestation Sites Planted with Shorea macrophylla de Vriese in Sampadi Forest Reserve, Sarawak, Malaysia