Peringkasan Dokumen Berita COVID-19 Berbasis Deep Learning Menggunakan Arsitektur Transformer

(1)

Peringkasan Dokumen Berita COVID-19 Berbasis Deep Learning Menggunakan Arsitektur Transformer

Laporan Tugas Akhir

Diajukan Untuk Memenuhi Persyaratan Guna Meraih Gelar Sarjana Informatika Universitas Muhammadiyah Malang

Kharisma Muzaki Ghufron 201710370311079

Bidang Minat Data Science

PROGRAM STUDI INFORMATIKA FAKULTAS TEKNIK

UNIVERSITAS MUHAMMADIYAH MALANG 2020

(2)

i

(3)

ii

(4)

iii

(5)

iv ABSTRAK

Menghadapi dokumen berita penyebaran coronavirus disease 2019 (COVID-19) sangat menantang dikarenakan perlu waktu yang lama untuk mendapatkan informasi berguna dari konten berita. Deep learning telah mempunyai banyak kontribusi dalam bidang natural language processing, akan tetapi model pada penelitian sebelumnya masih memiliki kekurangan. Contoh hasil pada peringkasan text berulang dan berlebihan sehingga hasil kalimat yang dibentuk sulit untuk dirangkai dan nilai recall dari kalimat prediksi terhadap kalimat asli yang didapatkan rendah. Penelitian ini bertujuan untuk mengimplementasikan model deep learning dalam meringkas dokumen berita COVID-19. Dalam penelitian dilakukan pengusulan rancangan model berbasis Transformer yang sudah dimodifikasi untuk meningkatkan hasil peringkasan secara signifikan. Pada penelitian ini dilakukan modifikasi pada lapisan encoder dan decoder secara berulang kemudian membandingkan menggunakan score ROUGE. Hasil dari penelitian didapatkan model terbaik pada nilai score ROUGE-1 dan ROUGE-2 secara berturut-turut 0.58, dan 0.42 dengan waktu pelatihan 11438 detik. Model yang diusulkan dapat melakukan peringkasan abstraktif secara efektif pada dokumen berita.

Kata Kunci: COVID-19, NLP, Deep Learning, Peringkasan Berita, Arsitektur Transformer

(6)

v ABSTRACT

Facing the news on the internet about the spreading of coronavirus disease 2019 (COVID-19) is challenging because it is required a long time to get valuable information from the news. Deep learning has a significant impact on NLP research.

However, the deep learning models used in several studies, especially in document summary, still have a deficiency. The other results are redundant, or the characters repeatedly appeared so that the resulting sentences were less organized, and the recall value obtained was low. This study aims to summarize using a deep learning model implemented to COVID-19 the news documents. This research proposed the Transformer with architectural modification as the basis for designing the model to improve results significantly in document summarization. Moreover, author present Transformer-based architecture model with encoder and decoder that can be done several times repeatedly and make a comparison of layer modifications based on scoring. From the resulting experiment used, ROUGE-1 and ROUGE-2 show the good performance for the proposed model with scores 0.58 and 0.42, respectively, with a training time of 11438 seconds. The model proposed was evidently effective in improving result performance in abstractive document summarization.

Keywords: COVID-19, NLP, Deep learning, News summarization, Transformer architecture

(7)

vi

LEMBAR PERSEMBAHAN

Puji syukur kepada Allah SWT atas rahmat dan karunia-Nya sehingga penulis dapat menyelesaikan Tugas Akhir ini. Penulis menyampaikan ucapan terima kasih yang sebesar-besarnya kepada :

1. Ibu Nur Hayatin, S. ST., M.Kom dan Bapak Galih Wasis Wicaksono, S.Kom., M.Cs. selaku pembimbing tugas akhir.

2. Bapak/Ibu Dekan Fakultas Teknik Universitas Muhammadiyah Malang.

3. Ketua Program Studi Informatika Universitas Muhammadiyah Malang Ibu Hj. Gita Indah Marthasari, ST., M.Kom.

4. Kedua orang tua tercinta Bapak Abdul Malik dan Ibu Sri Winiarti.

5. Saudara keluarga saya Friska Ramadhani dan Ecky Rahmad yang selalu memberi dukungan support perkuliahan.

6. Dosen wali yang membimbing kuliah penulis dari awal memasuki perkuliahan Bapak Denar Regata Akbi, S.Kom., M.Kom.

7. Sahabat Asisten Lab. khususnya angkatan 2017 beserta jajaran Staff Laboran, Wakil Ketua dan Ketua Laboratorium Informatika Bapak Agus Eko Minarno, S.Kom., M.Kom. yang sudah memberikan dukungan fasilitas riset penelitian dan organisasi sebelumnya.

8. Rekan-rekan mahasiswa Informatika Angkatan 2017 (khususnya kelas B).

9. Jurnal Telkomnika yang telah melakukan review dan masukan pada penelitian ini.

10. Tim Google Colaboratory yang menyediakan fasilitas cloud untuk riset Deep Learning.

11. Geek people on Data Science. Your community and environment are impressive.

Malang, 23 Oktober 2020

Penulis

(8)

vii

KATA PENGANTAR

Dengan memanjatkan puji syukur kehadirat Allah SWT. Atas limpahan rahmat dan hidayah-NYA sehingga peneliti dapat menyelesaikan tugas akhir yang berjudul :

“Peringkasan Dokumen Berita COVID-19 Berbasis Deep Learning Menggunakan Arsitektur Transformer”

Di dalam tulisan ini disajikan pokok-pokok bahasan yang meliputi dataset berita dokumen berita COVID-19, rancangan modifikasi model berbasis Transformer, Hyperparameter pada model, dan perbandingan hasil performa model.

Peneliti menyadari sepenuhnya bahwa dalam penulisan tugas akhir ini masih banyak kekurangan dan keterbatasan. Oleh karena itu peneliti mengharapkan saran yang membangun agar tulisan ini bermanfaat bagi perkembangan ilmu pengetahuan.

Malang, 23 Oktober 2020

Penulis

(9)

viii DAFTAR ISI

HALAMAN JUDUL

BAB I ... 1

1.1 Latar Belakang ... 1

1.2 Rumusan Masalah ... 2

1.3 Tujuan Penelitian ... 2

1.4 Batasan Masalah ... 3

BAB II ... 4

2.1 Penelitian Terdahulu ... 4

2.2 Peringkasan Teks ... 5

2.3 Deep learning ... 5

2.4 Transformer ... 5

2.4.1 Positional Encoding ... 6

2.4.2 Encoder-Decoder Stack ... 7

2.4.3 Softmax ... 8

2.4.4 Attention ... 8

2.4.5 Scaled Dot-Product Attention ... 8

2.4.6 Multi-Head Attention ... 8

2.5 Word Embedding ... 9

2.6 Gaussian Error Linear Unit (GELU) ... 9

2.7 Rectified Linear Unit (ReLU) ... 9

2.8 Optimasi Adam ... 10

BAB III ... 11

3.1 Dataset ... 11

3.2 Preprocessing ... 12

3.3 Model Transformer ... 13

3.4 Lingkungan Kerja ... 14

3.5 Evaluasi Model ... 14

3.6 Skenario Pengujian ... 15

BAB IV ... 16

4.1 Load Embedding & Cek Kontraksi ... 16

4.2 Fungsi Cleaning ... 16

(10)

ix

4.3 Distribusi Kata ... 17

4.4 Hasil Distribusi Kata ... 18

4.5 Tokenisasi & Embedding Kata ... 19

4.6 Pemisahan Data ... 19

4.7 Bucket Dataset ... 21

4.8 Library Modelling & Import ... 21

4.9 Batch Dataset ... 22

4.10 Gelu, Dropout, Normalisasi ... 23

4.11 Encoder ... 24

4.12 Decoder ... 24

4.13 Build Model ... 27

4.14 Pelatihan Model ... 27

4.15 Validasi Model ... 27

4.16 Hasil Pelatihan Model ... 30

4.17 Komparasi Model ... 32

4.18 Word Cloud... 32

4.19 Distribusi Skor Model MTDTG 6 ... 35

BAB V ... 36

4.20 Kesimpulan ... 36

4.21 Saran ... 36

DAFTAR PUSTAKA ... 37

(11)

x

DAFTAR GAMBAR

Gambar 1. Dasar Arsitektur Transformer [10] ... 6

Gambar 2. Scaled Dot-Product Attention (Kiri), Multi-Head Attention (Kanan) [10] ... 7

Gambar 3. Grafik ReLU ... 10

Gambar 4. Skenario Eksperimen ... 11

Gambar 5. Model Transformer ... 13

Gambar 6. Source Code Load Embedding & Cek Kontraksi ... 16

Gambar 7. Source Code Embedding & Dataset, Eksekusi Cleaning ... 17

Gambar 8. Source Code Kontrol Distribusi ... 18

Gambar 9. Hasil Distribusi Pada Konten Berita ... 18

Gambar 10. Hasil Distribusi Pada Deskripsi Berita ... 18

Gambar 11. Source Code Tokenisasi dan Embedding Kata ... 19

Gambar 12. Source Code Tokenisasi dan Embedding Kata ... 20

Gambar 13. Hasil Pemisahan Dataset ... 20

Gambar 14. Source Code Bucket Sequence... 21

Gambar 15. Source Code Import Dokumen & Pustaka ... 22

Gambar 16. Source Code Batch Sequence ... 23

Gambar 17. Source Code GELU, Dropout, dan Normalisasi ... 24

Gambar 18. Source Code Encoder ... 25

Gambar 19. Source Code Decoder ... 26

Gambar 20. Source Code Pembangunan Model ... 27

Gambar 21. Source Code Pelatihan Model ... 28

Gambar 22. Source Code Validasi dan Simpan Checkpoint Model... 29

Gambar 23. Error Pelatihan Model ... 31

Gambar 24. Nilai Skor ROUGE-1 Pelatihan Model ... 31

Gambar 25. Source Code Test Model ... 32

Gambar 26. Source Code Word Cloud ... 33

Gambar 27. Word Cloud Model MTDTG-6 ... 33

Gambar 28. Source Code Skor Distribusi Pengujian MTDTG-6 ... 35

Gambar 29. Grafik Distribusi Pengujian MTDTG-6 ... 35

(12)

xi

DAFTAR TABEL

Tabel 1 Penelitian Terdahulu ... 4

Tabel 2 Tabel Token ... 11

Tabel 3 Lingkungan Kerja ... 13

Tabel 4 Skenario Pengujian ... 14

Tabel 5 Contoh Hasil Cleaning ... 16

Tabel 6 Contoh Konten Berita ... 29

Tabel 7 Hasil Perbandingan Ringkasan ... 29

Tabel 8 Hasil Uji Coba Model ... 31

Tabel 9 Representasi Kata Kunci ... 33

(13)

xii

(14)

xiii

DAFTAR PUSTAKA

[1] A. A. Salisu and X. V. Vo, “Predicting stock returns in the presence of COVID-19 pandemic: The role of health news,” Int. Rev. Financ. Anal., vol.

71, no. April, p. 135577, 2020, doi: 10.1016/j.scitotenv.2019.135577.

[2] A. K. M. N. Islam, S. Laato, S. Talukder, and E. Sutinen, “Misinformation sharing and social media fatigue during COVID-19: An affordance and cognitive load perspective,” Technol. Forecast. Soc. Change, vol. 159, no.

July, p. 120201, 2020, doi: 10.1016/j.techfore.2020.120201.

[3] T. Uçkan and A. Karcı, “Extractive multi-document text summarization based on graph independent sets,” Egypt. Informatics J., no. xxxx, 2020, doi:

10.1016/j.eij.2019.12.002.

[4] S. Gupta and S. K. Gupta, “Abstractive summarization: An overview of the state of the art,” Expert Syst. Appl., vol. 121, no. 2018, pp. 49–65, 2019, doi:

10.1016/j.eswa.2018.12.011.

[5] Y. Huang, Z. Yu, J. Guo, Z. Yu, and Y. Xian, “Legal public opinion news abstractive summarization by incorporating topic information,” Int. J. Mach.

Learn. Cybern., no. 0123456789, 2020, doi: 10.1007/s13042-020-01093-8.

[6] C. Yuan, Z. Bao, M. Sanderson, and Y. Tang, “Incorporating word attention with convolutional neural networks for abstractive summarization,” World Wide Web, vol. 23, no. 1, pp. 267–287, 2020, doi: 10.1007/s11280-019- 00709-6.

[7] A. K. Mohammad Masum, S. Abujar, A. Md Islam Talukder, A. K. M. S.

Azad Rabby, and S. A. Hossain, “Abstractive method of text summarization with sequence to sequence RNNs,” 2019 10th Int. Conf. Comput. Commun.

Netw. Technol. ICCCNT 2019, pp. 1–5, 2019, doi:

10.1109/ICCCNT45670.2019.8944620.

[8] P. M. Hanunggul and S. Suyanto, “The Impact of Local Attention in LSTM for Abstractive Text Summarization,” 2019 2nd Int. Semin. Res. Inf. Technol.

(15)

xiv

Intell. Syst. ISRITI 2019, pp. 54–57, 2019, doi:

10.1109/ISRITI48646.2019.9034616.

[9] B. Myagmar, J. Li, and S. Kimura, “Cross-Domain Sentiment Classification with Bidirectional Contextualized Transformer Language Models,” IEEE

Access, vol. 7, pp. 163219–163230, 2019, doi:

10.1109/ACCESS.2019.2952360.

[10] A. Vaswani et al., “Attention is all you need,” Adv. Neural Inf. Process. Syst., vol. 2017-Decem, no. Nips, pp. 5999–6009, 2017.

[11] T. A. Fuad, M. T. Nayeem, A. Mahmud, and Y. Chali, “Neural sentence fusion for diversity driven abstractive multi-document summarization,”

Comput. Speech Lang., vol. 58, pp. 216–230, 2019, doi:

10.1016/j.csl.2019.04.006.

[12] J. Á. González, L. F. Hurtado, and F. Pla, “Transformer based contextualization of pre-trained word embeddings for irony detection in Twitter,” Inf. Process. Manag., vol. 57, no. 4, p. 102262, 2020, doi:

10.1016/j.ipm.2020.102262.

[13] J. W. Lin, Y. C. Gao, and R. G. Chang, “Chinese Story Generation with FastText Transformer Network,” 1st Int. Conf. Artif. Intell. Inf. Commun.

ICAIIC 2019, pp. 395–398, 2019, doi: 10.1109/ICAIIC.2019.8669087.

[14] Y. Iwasaki, A. Yamashita, Y. Konno, and K. Matsubayashi, “Japanese abstractive text summarization using BERT,” Proc. - 2019 Int. Conf.

Technol. Appl. Artif. Intell. TAAI 2019, no. 1, 2019, doi:

10.1109/TAAI48200.2019.8959920.

[15] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. 2014.

[16] D. Anand and R. Wagh, “Effective deep learning approaches for summarization of legal texts,” J. King Saud Univ. - Comput. Inf. Sci., no.

xxxx, 2019, doi: 10.1016/j.jksuci.2019.11.015.

[17] N. H. Tien, N. M. Le, Y. Tomohiro, and I. Tatsuya, “Sentence modeling via multiple word embeddings and multi-level comparison for semantic textual

(16)

xv

similarity,” Inf. Process. Manag., vol. 56, no. 6, p. 102090, 2019, doi:

10.1016/j.ipm.2019.102090.

[18] J. Pennington, R. Socher, and C. D. Manning, “GloVe: Global vectors for word representation,” 2014. doi: 10.3115/v1/d14-1162.

[19] N. Alami, M. Meknassi, and N. En-nahnahi, “Enhancing unsupervised neural networks based text summarization with word embedding and ensemble learning,” Expert Syst. Appl., vol. 123, pp. 195–211, 2019, doi:

10.1016/j.eswa.2019.01.037.

[20] D. Hendrycks and K. Gimpel, “Gaussian Error Linear Units (GELUs),” pp.

1–9, 2016, [Online]. Available: http://arxiv.org/abs/1606.08415.

[21] A. F. Agarap, “Deep Learning using Rectified Linear Units (ReLU),” no. 1, pp. 2–8, 2018, [Online]. Available: http://arxiv.org/abs/1803.08375.

[22] D. P. Kingma and J. L. Ba, “Adam: A method for stochastic optimization,”

3rd Int. Conf. Learn. Represent. ICLR 2015 - Conf. Track Proc., pp. 1–15, 2015.

[23] S. A. Alsaidi, A. T. Sadiq, and H. S. Abdullah, “English poems categorization using text mining and rough set theory,” Bull. Electr. Eng.

Informatics, vol. 9, no. 4, pp. 1701–1710, 2020, doi: 10.11591/eei.v9i4.1898.

[24] “COVID-19 News Articles Open Research Dataset | Kaggle.”

https://www.kaggle.com/ryanxjhan/cbc-news-coronavirus-articles-march- 26 (accessed Jul. 22, 2020).

[25] L. Ruhwinaningsih and T. Djatna, “A Sentiment Knowledge Discovery Model in Twitter’s TV Content Using Stochastic Gradient Descent Algorithm,” TELKOMNIKA (Telecommunication Comput. Electron.

Control., vol. 14, no. 3, p. 1067, 2016, doi:

10.12928/telkomnika.v14i3.2671.

[26] M. A. Fauzi, R. F. N. Firmansyah, and T. Afirianto, “Improving sentiment analysis of short informal Indonesian product reviews using synonym based feature expansion,” Telkomnika (Telecommunication Comput. Electron.

(17)

xvi

Control., vol. 16, no. 3, pp. 1345–1350, 2018, doi:

10.12928/TELKOMNIKA.v16i3.7751.

[27] M. E. J. Newman, “Power laws, Pareto distributions and Zipf’s law,”

Contemp. Phys., vol. 46, no. 5, pp. 323–351, 2005, doi:

10.1080/00107510500052444.

[28] J. L. Ba, J. R. Kiros, and G. E. Hinton, “Layer Normalization,” 2016, [Online]. Available: http://arxiv.org/abs/1607.06450.

[29] J. Mallinson, R. Sennrich, and M. Lapata, “Sentence compression for arbitrary languages via multilingual pivoting,” Proc. 2018 Conf. Empir.

Methods Nat. Lang. Process. EMNLP 2018, no. 2015, pp. 2453–2464, 2020, doi: 10.18653/v1/d18-1267.

[30] J. Gehring, M. Auli, D. Grangier, D. Yarats, and Y. N. Dauphin,

“Convolutional sequence to sequence learning,” 34th Int. Conf. Mach.

Learn. ICML 2017, vol. 3, pp. 2029–2042, 2017.

[31] C. Y. Lin, “Rouge: A package for automatic evaluation of summaries,” Proc.

Work. text Summ. branches out (WAS 2004), no. 1, pp. 25–26, 2004, [Online]. Available: papers2://publication/uuid/5DDA0BB8-E59F-44C1- 88E6-2AD316DAEF85.