Saran - KESIMPULAN DAN SARAN - ANALISIS SENTIMEN TERHADAP FILM INDONESIA DENGAN PENDEKATAN BERT

BAB 5 KESIMPULAN DAN SARAN

5.2 Saran

Adapun saran yang dapat dipertimbangkan untuk penelitian selanjutnya adalah:

1. Dataset yang digunakan pada analisis sentimen menggunakan BERT memiliki jumlah yang seimbang antara sentimen positif, negatif, dan netral.

2. Menggunakan model BERT dalam bahasa Indonesia yaitu IndoBERT (Koto et al., 2020) atau IndoBERT (Wilie et al., 2020) untuk memperoleh hasil yang lebih baik dengan menggunakan dataset yang lebih banyak dan seimbang.

3. Menggunakan dataset yang lebih banyak untuk menghasilkan nilai akurasi yang lebih baik.

DAFTAR PUSTAKA

Abdul, S., Qiang, Y., Basit, S., & Ahmad, W. (2019). Using BERT for Checking the Polarity of Movie Reviews. International Journal of Computer Applications, 177(21), 37–41. https://doi.org/10.5120/ijca2019919675

Aggarwal, C. C. (2018). Neural Networks and Deep Learning. In Artificial Intelligence. Springer. https://doi.org/10.1201/b22400-15

Alammar, J. (2018). The Illustrated Transformer.

https://jalammar.github.io/illustrated-transformer/

Aliyah Salsabila, N., Ardhito Winatmoko, Y., Akbar Septiandri, A., & Jamal, A.

(2019). Colloquial Indonesian Lexicon. Proceedings of the 2018 International Conference on Asian Language Processing, IALP 2018, 226–229.

https://doi.org/10.1109/IALP.2018.8629151

Chollet, F. (2018). Deep Learning with Phyton. In Manning.

http://faculty.neu.edu.cn/yury/AAI/Textbook/Deep Learning with Python.pdf Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference, 1(Mlm), 4171–4186.

Ghosh, S., Roy, S., & Bandyopadhyay, S. K. (2012). A tutorial review on Text Mining Algorithms. International Journal of Advanced Research in Computer and Communication Engineering, 1(4), 223–233. www.ijarcce.com

Goldberg, Y. (2017). Neural Network Methods for Natural Language Processing

(Vol. 10). Morgan & Claypool Publishers.

https://dx.doi.org/10.1162/COLI_r_00312

Herfianto. (2020). Prestasi dan Penghargaan Film Gundala. Liputan6.Com.

https://www.liputan6.com/showbiz/read/4155178/prestasi-dan-penghargaan-film-gundala

Jurafsky, D., & Martin, J. H. (2019). Speech and language processing. In Prentice Hall, 3rd Edition. https://doi.org/10.4324/9780203461891_chapter_3

Kemp, S. (2020). Digital 2020: Indonesia. We Are Social & HootSuite.

https://datareportal.com/reports/digital-2020-indonesia

Kingma, D. P., & Ba, J. L. (2015). Adam: A method for stochastic optimization.

3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings, 1–15.

Koto, F., Rahimi, A., Lau, J. H., & Baldwin, T. (2020). IndoLEM and IndoBERT:

A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP.

http://arxiv.org/abs/2011.00677

Maharani, W. (2020). Sentiment Analysis during Jakarta Flood for Emergency Responses and Situational Awareness in Disaster Management using BERT.

2020 8th International Conference on Information and Communication

Technology, ICoICT 2020.

https://doi.org/10.1109/ICoICT49345.2020.9166407

Munikar, M., Shakya, S., & Shrestha, A. (2019). Fine-grained Sentiment Classification using BERT. International Conference on Artificial Intelligence for Transforming Business and Society, AITB 2019, 1, 1–5.

https://doi.org/10.1109/AITB48515.2019.8947435

Osinga, D. (2018). Deep Learning Cookbook. In Book (Issue June).

Pekel, E., & Kara, S. S. (2017). A Comprehensive Review for Artifical Neural Network Application to Public Transportation. Sigma Journal of Engineering and Natural Sciences, 35(1), 157–179.

Putri, C. A. (2020). Analisis Sentimen Review Film Berbahasa Inggris Dengan Pendekatan Bidirectional Encoder Representations from Transformers. JATISI (Jurnal Teknik Informatika Dan Sistem Informasi), 6(2), 181–193.

https://doi.org/10.35957/jatisi.v6i2.206

Song, Y., Wang, J., Liang, Z., Liu, Z., & Jiang, T. (2020). Utilizing BERT intermediate layers for aspect based sentiment analysis and natural language inference. ArXiv.

Sun, C., Qiu, X., Xu, Y., & Huang, X. (2019). How to Fine-Tune BERT for Text Classification? Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 11856 LNAI(2), 194–206. https://doi.org/10.1007/978-3-030-32381-3_16

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 2017-Decem(Nips), 5999–6009.

Wilie, B., Vincentio, K., Winata, G. I., Cahyawijaya, S., Li, X., Lim, Z. Y., Soleman, S., Mahendra, R., Fung, P., Bahar, S., & Purwarianti, A. (2020).

IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding. http://arxiv.org/abs/2009.05387

Xu, H., Liu, B., Shu, L., & Yu, P. S. (2019). BERT post-training for review reading comprehension and aspect-based sentiment analysis. NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference, 1, 2324–2335.

Xu, S., Li, Y., & Wang, Z. (2017). Bayesian Multinomial Na ï ve Bayes Classi fi er to Text Classi fi cation. 15. https://doi.org/10.1007/978-981-10-5041-1

Yanuar, M. R., & Shiramatsu, S. (2020). Aspect Extraction for Tourist Spot Review in Indonesian Language using BERT. 2020 International Conference on Artificial Intelligence in Information and Communication, ICAIIC 2020, 298–

302. https://doi.org/10.1109/ICAIIC48513.2020.9065263

LAMPIRAN

html.send_keys(Keys.END) time.sleep(SCROLL_PAUSE_TIME)

comment_elems =

with open(filename, 'w', encoding="utf-16") as file:

csv_writer = csv.writer(file, delimiter='\n') csv_writer.writerow(header)

csv_writer.writerow(all_comments)

2. Sentence Splitting Dataset

3. BERT

sentences = []

for row in dataset.itertuples():

for sentence in sent_tokenize(row[1]):

sentences.append((row[1], sentence))

new_df = pd.DataFrame(sentences, columns = ['Text','Comment'])

nama_sentimen = ['negative', 'neutral', 'positive']

ax = sns.countplot(df.Sentiment) plt.xlabel('Opini terhadap Gundala') ax.set_xticklabels(nama_sentimen);

# preprocessing

def text_preprocessing(text):

text = text.lower()

text = re.sub(r"(https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9][a-zA-Z0-9-

names=["stopwords"], header=None) # panggil stopword listnya list_stopwords = stopwords.words('indonesian')

list_stopwords.remove("sangat") list_stopwords.remove("sekali")

list_stopwords.extend(txt_stopwords["stopwords"][0].split(' ')) list_stopwords = set(list_stopwords)

def preprocess_lagi(text): # fungsi lagiii lst_text = word_tokenize(text)

lst_text = [word for word in lst_text if word not in list_stopwords]

factory = StemmerFactory()

stemmer = factory.create_stemmer()

lst_text = [stemmer.stem(word) for word in lst_text]

text = " ".join(lst_text) text = lst_text

return text

normalize_word_dict = {}

for index, row in kamus_alay_moza.iterrows():

if row[0] not in normalize_word_dict:

normalize_word_dict[row[0]] = row[1]

def normalisasi_kata(text):

return [normalize_word_dict[term] if term in normalize_word_dict else term for term in text]

normalize_word_dict_dua = {}

for index, row in kamus_alay.iterrows(): # untuk index, row di kamus alay

if row[0] not in normalize_word_dict_dua: # jika di row[0] ga ada di kamus alay

normalize_word_dict_dua[row[0]] = row[1]

def normalisasi_kata_dua(text): # fungsi rekursif normalisasi term (iter)

return [normalize_word_dict_dua[term] if term in normalize_word_dict_dua else term for term in text]

PRE_TRAINED_MODEL_NAME = 'bert-base-multilingual-cased'

df_train, df_test = train_test_split(df, test_size=0.1)

class GPReviewDataset(Dataset):

def __init__(self, reviews, targets, tokenizer, max_len):

self.reviews =

self.targets = targets self.tokenizer = tokenizer self.max_len = max_len def __len__(self):

return len(self.reviews) def __getitem__(self, item):

review = str(self.reviews[item]) target = self.targets[item]

encoding = self.tokenizer.encode_plus(

review,

add_special_tokens=True, max_length=self.max_len, return_token_type_ids=False, padding='max_length',

return_attention_mask=True, return_tensors='pt',

)

return {

'review_text': review,

'input_ids': encoding['input_ids'].flatten(),

'attention_mask': encoding['attention_mask'].flatten(), 'targets': torch.tensor(target, dtype=torch.long)

}

def create_data_loader(df, tokenizer, max_len, batch_size):

ds = GPReviewDataset(

reviews=df.Comment.to_numpy(), targets=df.Sentiment.to_numpy(), tokenizer=tokenizer,

max_len=max_len )

return DataLoader(

ds,

batch_size=batch_size, num_workers=4

)

class SentimentClassifier(nn.Module):

def __init__(self, n_classes):

super(SentimentClassifier, self).__init__()

self.bert = BertModel.from_pretrained(PRE_TRAINED_MODEL_NAME) self.drop = nn.Dropout(p=0.1)

self.out = nn.Linear(self.bert.config.hidden_size, n_classes) def forward(self, input_ids, attention_mask):

# Feed input to BERT

_, pooled_output = self.bert(

input_ids=input_ids,

attention_mask=attention_mask )

output = self.drop(pooled_output) return self.out(output)

model = SentimentClassifier(len(nama_sentimen)) model = model.to(device)

input_ids = data['input_ids'].to(device)

total_steps = len(train_data_loader) * EPOCHS scheduler = get_linear_schedule_with_warmup(

optimizer,

num_warmup_steps=0,

num_training_steps=total_steps )

input_ids = d["input_ids"].to(device)

attention_mask = d["attention_mask"].to(device) targets = d["targets"].to(device)

outputs = model(

input_ids=input_ids,

attention_mask=attention_mask )

_, preds = torch.max(outputs, dim=1) loss = loss_fn(outputs, targets)

correct_predictions += torch.sum(preds == targets) losses.append(loss.item())

loss.backward()

nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0) optimizer.step()

scheduler.step() optimizer.zero_grad()

return correct_predictions.double() / n_examples, np.mean(losses)

def eval_model(model, data_loader, loss_fn, device, n_examples):

model = model.eval() losses = []

correct_predictions = 0 with torch.no_grad():

for d in data_loader:

input_ids = d["input_ids"].to(device)

attention_mask = d["attention_mask"].to(device) targets = d["targets"].to(device)

correct_predictions += torch.sum(preds == targets) losses.append(loss.item())

return correct_predictions.double() / n_examples, np.mean(losses)

history = defaultdict(list) best_accuracy = 0

for epoch in range(EPOCHS):

print(f'Epoch {epoch + 1}/{EPOCHS}') print('-' * 10)

train_acc, train_loss = train_epoch(

model,

print(f'Train loss {train_loss} accuracy {train_acc}') val_acc, val_loss = eval_model(

model,

val_data_loader, loss_fn,

device, len(df_val))

print(f'Val loss {val_loss} accuracy {val_acc}') print()

torch.save(model.state_dict(), 'best_model_state.bin') best_accuracy = val_acc

plt.plot(history['train_acc'], label='train accuracy')

input_ids = d["input_ids"].to(device)

attention_mask = d["attention_mask"].to(device) targets = d["targets"].to(device)

prediction_probs.extend(probs) real_values.extend(targets)

predictions = torch.stack(predictions).cpu()

prediction_probs = torch.stack(prediction_probs).cpu() real_values = torch.stack(real_values).cpu()

return review_texts, predictions, prediction_probs, real_values y_review_texts, y_pred, y_pred_probs, y_test = get_predictions(

model,

test_data_loader )

print(classification_report(y_test, y_pred, target_names=nama_sentimen))

def show_confusion_matrix(confusion_matrix):

hmap = sns.heatmap(confusion_matrix, annot=True, fmt="d", cmap="Blues")

hmap.yaxis.set_ticklabels(hmap.yaxis.get_ticklabels(), rotation=0, ha='right')

hmap.xaxis.set_ticklabels(hmap.xaxis.get_ticklabels(), rotation=30, ha='right')

plt.ylabel('True sentiment')

plt.xlabel('Predicted sentiment');

cm = confusion_matrix(y_test, y_pred)

df_cm = pd.DataFrame(cm, index=nama_sentimen, columns=nama_sentimen)

show_confusion_matrix(df_cm)

Lampiran 2. Curriculum Vitae

Dwi Fimoza dwi.fimoza@gmail.com linkedin.com/in/dwifimoza

Education

Universitas Sumatera Utara, Majoring Computer Science Sept 2016 – Present GPA 3.8 / 4.00; expected graduation in January 2021

Working Experience

Unilever Leadership Internship Programme (ULIP) Feb 2020 – May 2020 Intern in E-RTM, Customer Development.

• Collected and summarized sales report from 8 (eight) provinces in Sumatera with more than 125.000 outlets and 100 distributors.

• Created an improved dashboard to maximize the summary from salesman, distributors, and area.

Ilmu Komputer Laboratory Center (IKLC) July 2018 – Jan 2020

Laboratory Assistant

• The courses that have taught are Web Programming, Semantic Web, Computer Graphic and Visualization, Advanced Database Management System, etc.

• Responsible for 7 classes so far, develop learning modules for each class, and assists the students.

• Graded the assignments and exams of all the students.

Kantor Pelayanan Kekayaan Negara dan Lelang Medan July 2019 – Aug 2019 Internship in General Subsection

• Analyzed, designed, and created an information system using PHP programming language and MySQL

• Made a web-based information system to archive documents based on the regulations of Ministry of Finance for Directorate General of State Assets and Auctions.

Organizational Experience

Ilmu Komputer Laboratory Center (IKLC) Oct 2018 – Jan 2019 Assessment Division

• Formulated, distributed, and compiled the students' grades from every laboratory assistant.

• Ensured all classes' assessments match with the curriculum and submitted them into the system on time.

Muslim Student Activity Unit Al Khuwarizmi USU May 2017 – April 2018 Syiar Division (Public Relations)

• Made digital designs for events handled by Muslim Student Activity Unit Al-Khuwarizmi USU and contents to post on social media.

Achievements

1^stWinner, Website User Interface Festival Ajisaka 2019 Nov 2019 Department of Communication, Universitas Gadjah Mada

• Conceptualized how to encourage Gen-Z to reduce the usage of the single plastic

• Created and designed the interface of microsite named Baik Berplastik.

XL Future Leaders Batch 6 Awardee Nov 2017 – Nov 2019 CSR Initiative by PT XL Axiata Tbk

• Chosen as 1 of 150 selected students across Indonesia from over 12.500 applicants.

• Participate in on-going intensive training in three core competencies: effective communication, entrepreneurship, and innovation, and managing change.

Runner Up, UX Comp 2019 June 2019

IMILKOM, Universitas Sumatera Utara

• Chosen as 2^nd winner of UX Competition held by IMILKOM USU from 10 finalists from various universities across Indonesia.

Projects

MinerMet Feb 2019 – Nov 2019

Researcher Team

• Collaborated with others to create an IoT project as a solution for problems occur in the mining site.

• Designed the workflow using Photoshop.

• Finalized the concept for the exhibition held by PT XL Axiata Tbk.

SINAR – Sistem Informasi Pengarsipan July 2019 – Aug 2019 UI/UX Designer

• Responsible for designing the system's prototype to fit what the user needs and built the user interface using Bootstrap and jQuery.

Zahra Wedding Organizer Mar 2017 – July 2017

Backend Programmer

• Built the website using native PHP programming language, HTML, JavaScript, and MySQL database for end of semester’s project.

Skills

• Software: Microsoft Office (Word, Excel, Power Point, Outlook), Adobe Photoshop, Adobe XD

• Programming: Python, HTML/CSS, PHP, Java

• Languages: Indonesian (native), English (fluent)

Dalam dokumen ANALISIS SENTIMEN TERHADAP FILM INDONESIA DENGAN PENDEKATAN BERT SKRIPSI DWI FIMOZA (Halaman 78-95)