Saran - Penerapan Representasi Emoji dan Kontraksi Serta Ekstraksi Fitur Latent Semantic Analys

BAB 6

Berdasarkan hasil dan pembahasan yang sudah diberikan penulis sebelumnya, berikut beberapa saran yang dapat diberikan penulis untuk penelitian selanjutnya:

1. Pada penelitian ini, penulis menggunakan 100 komponen fitur yang dipertahankan pada ekstraksi fitur LSA (TruncatedSVD) untuk pembuatan model berdasarkan rekomendasi yang disarankan.

Diharapkan pada penelitian berikutnya dilakukan pencarian jumlah komponen fitur untuk mendapatkan keakuratan dari jumlah fitur yang berbeda dan memilih yang terbaik di antara jumlah fitur tersebut pada model klasifikasi Machine Learning dengan cara analisis komponen utama (PCA) atau metode validasi siluet (silhouette validation).

2. Pada penelitian ini, penulis hanya menggunakan beberapa emoji ASCII dan emoji Unicode sebagai representasi emoji. Diharapkan dalam penelitian selanjutnya perbanyak emoji yang dapat direpresentasikan.

3. Untuk representasi kontraksi, terutama kata-kata slang dalam Bahasa Inggris yang sedang tren, penulis harapkan pada penelitian selanjutnya juga diperbanyak kamus kata slang atau kata tren terbaru untuk untuk direpresentasikan menjadi makna yang lengkap dan benar.

4. Selain menghasilkan label sentimen pada analisis teks yang diinput, implementasi dalam penelitian ini juga memberitahukan tentang aspek terkait dengan pembahasan pada teks. Aspek yang digunakan disini ditentukan pada kamus aspek yang diambil referensi dari situs SKYTRAX dan juga beberapa melalui pencarian pada sampel data yang dilakukan penulis secara manual. Maka dari itu, untuk penelitian selanjutnya diharapkan dapat memperbanyak aspek yang ditentukan sebagai kemungkinan aspek yang terkait dengan data teks.

DAFTAR PUSTAKA

Agarwal, B., Nayak, R., Mittal, N., Srikanta, ·, & Editors, P. (2020). Deep Learning-Based Approaches for Sentiment Analysis.

http://www.springer.com/series/16171

Ahuja, R., Chug, A., Kohli, S., Gupta, S., & Ahuja, P. (2019). The impact of features extraction on the sentiment analysis. Procedia Computer Science, 152, 341–348. https://doi.org/10.1016/j.procs.2019.05.008

Anandarajan, M., Hill, C., & Nolan, T. (2019). Text Preprocessing (pp. 45–59).

https://doi.org/10.1007/978-3-319-95663-3_4

Arafah, B., & Hasyim, M. (2019). The Language of Emoji in Social Media. KnE Social Sciences. https://doi.org/10.18502/kss.v3i19.4880

Avinash, M., & Sivasankar, E. (2019). A study of feature extraction techniques for sentiment analysis. Advances in Intelligent Systems and Computing, 814, 475–486. https://doi.org/10.1007/978-981-13-1501-5_41

Bayhaqy, A., Sfenrianto, S., Nainggolan, K., & Kaburuan, E. R. (2018). Sentiment analysis about E-commerce from tweets using decision tree, K-nearest neighbor, and naïve bayes. In 2018 international conference on orange technologies (ICOT). http://dlvr.it/Qb83n8pic.twitter.com/8MucIMhUMO, Chaudhri, A. A., Saranya, S. S., & Dubey, S. (2021). Implementation Paper on

Analyzing COVID-19 Vaccines on Twitter Dataset Using Tweepy and Text Blob (Vol. 25). http://annalsofrscb.ro

Duei Putri, D., Nama, G. F., & Sulistiono, W. E. (2022). Analisis Sentimen Kinerja Dewan Perwakilan Rakyat (DPR) Pada Twitter Menggunakan Metode Naive Bayes Classifier. Jurnal Informatika Dan Teknik Elektro Terapan, 10(1). https://doi.org/10.23960/jitet.v10i1.2262

Guia, M., Silva, R. R., & Bernardino, J. (2019). Comparison of Naive Bayes, support vector machine, decision trees and random forest on sentiment analysis. IC3K 2019 - Proceedings of the 11th International Joint Conference on Knowledge Discovery, Knowledge Engineering and

Knowledge Management, 1, 525–531.

https://doi.org/10.5220/0008364105250531

Han, K. X., Chien, W., Chiu, C. C., & Cheng, Y. T. (2020). Application of support vector machine (SVM) in the sentiment analysis of twitter dataset.

Applied Sciences (Switzerland), 10(3). https://doi.org/10.3390/app10031125 Hasanli, H., & Rustamov, S. (2019). Sentiment Analysis of Azerbaijani twits

Using Logistic Regression, Naive Bayes and SVM.

Hussein, D. M. E. D. M. (2018). A survey on sentiment analysis challenges.

Journal of King Saud University - Engineering Sciences, 30(4), 330–338.

https://doi.org/10.1016/j.jksues.2016.04.002

Kaur, H., Ahsaan, S. U., Alankar, B., & Chang, V. (2021). A Proposed Sentiment Analysis Deep Learning Algorithm for Analyzing COVID-19 Tweets.

Information Systems Frontiers, 23(6), 1417–1429.

https://doi.org/10.1007/s10796-021-10135-7

Keerthi Kumar, H. M., & Harish, B. S. (2018). Classification of short text using various preprocessing techniques: An empirical evaluation. In Advances in Intelligent Systems and Computing (Vol. 709, pp. 19–30). Springer Verlag.

https://doi.org/10.1007/978-981-10-8633-5_3

Khairunnisa, S., Adiwijaya, A., & Faraby, S. Al. (2021a). Pengaruh Text

Preprocessing terhadap Analisis Sentimen Komentar Masyarakat pada Media Sosial Twitter (Studi Kasus Pandemi COVID-19). JURNAL MEDIA

INFORMATIKA BUDIDARMA, 5(2), 406.

https://doi.org/10.30865/mib.v5i2.2835

Khairunnisa, S., Adiwijaya, A., & Faraby, S. Al. (2021b). Pengaruh Text

Preprocessing terhadap Analisis Sentimen Komentar Masyarakat pada Media Sosial Twitter (Studi Kasus Pandemi COVID-19). JURNAL MEDIA

INFORMATIKA BUDIDARMA, 5(2), 406.

https://doi.org/10.30865/mib.v5i2.2835

Khalid, S. (2014). A Survey of Feature Selection and Feature Extraction Techniques in Machine Learning (Vol. 372). www.conference.thesai.org

Mothe, J., Son, L. H., Nguyen, T. Q. V., Đại học Đà Nẵng. University of Science and Technology, Quỹ phát triển khoa học công nghệ quốc gia (Vietnam), IEEE Vietnam Section., & Institute of Electrical and Electronics Engineers.

(2019). Proceedings of 2019 11th International Conference On Knowledge And Systems Engineering : KSE 2019 : October 24-26, 2019, Da Nang, Vietnam.

Naseem, U., Razzak, I., & Eklund, P. W. (2021). A survey of pre-processing techniques to improve short-text quality: a case study on hate speech

detection on twitter. Multimedia Tools and Applications, 80(28–29), 35239–

35266. https://doi.org/10.1007/s11042-020-10082-6

Orkphol, K., & Yang, W. (2019). Sentiment Analysis on Microblogging with K- Means Clustering and Artificial Bee Colony. International Journal of Computational Intelligence and Applications, 18(3).

https://doi.org/10.1142/S1469026819500172

Pasek, P., Mahawardana, O., Sasmita, G. A., Agus, P., & Pratama, E. (2022).

Analisis Sentimen Berdasarkan Opini dari Media Sosial Twitter terhadap

“Figure Pemimpin” Menggunakan Python. In JITTER-Jurnal Ilmiah Teknologi dan Komputer (Vol. 3, Issue 1).

Poornima. A, & K. Sathiya Priya. (2020). 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS).

Prakash Pokharel, B. (2020). Twitter Sentiment analysis during COVID-19 Outbreak in Nepal. https://ssrn.com/abstract=3624719

Rhanoui, M., Mikram, M., Yousfi, S., & Barzali, S. (2019). A CNN-BiLSTM Model for Document-Level Sentiment Analysis. Machine Learning and Knowledge Extraction, 1(3), 832–847. https://doi.org/10.3390/make1030048 Safitri, R. (2018). Jurnal Tibanndaru Volume 2 Nomor 2, Oktober (Vol. 2, Issue

2).

Sailunaz, K., & Alhajj, R. (2019). Emotion and sentiment analysis from Twitter text. Journal of Computational Science, 36.

https://doi.org/10.1016/j.jocs.2019.05.009

Sari, I. P., Jannah, A., Meuraxa, A. M., Syahfitri, A., & Omar, R. (2022).

Perancangan Sistem Informasi Penginputan Database Mahasiswa Berbasis Web. Hello World Jurnal Ilmu Komputer, 1(2), 106–110.

https://doi.org/10.56211/helloworld.v1i2.57

Satuluri Vanaja, & Meena Belwal. (2018). Proceedings of the International Conference on Inventive Research in Computing Applications (ICIRCA 2018) : date: July 11-12, 2018.

Singapore Airlines is the World’s Best Airline at 2023 World Airline Awards.

(2023, June 20). https://skytraxratings.com/singapore-airlines-is-named-the- worlds-best-airline-at-the-2023-world-airline-awards

LAMPIRAN

Kode Pengumpulan Data

import tweepy

consumer_key = 'CEr9DvFVDa9avDRwpCgAdljwf' consumer_secret =

'kIKaY8BJTegNVnI2vuXGuXYgboIA4CZteYgYsLc3SXZJuiqHl8'

access_token = '1400872399524876292-tR11xRNzLj0P0xI2hR4nuBOf2zIeKJ' access_token_secret =

'jQEbiy3VJOXFVW3TWmvGSL3QSkxjGWVpO7BCUa1ZuIRDU'

# Authenticate

auth = tweepy.OAuthHandler(consumer_key, consumer_secret) auth.set_access_token(access_token, access_token_secret)

# Create API object api = tweepy.API(auth) import pandas as pd import tweepy

from datetime import date

# Search for tweets based on a keyword

keyword = '"Singapore Airlines" OR @SingaporeAir OR "Singapore Airline" OR "singapore airlines" OR "singapore airline" OR

@singaporeair OR #FlySQ OR #flysq'

hasilSearch = tweepy.Cursor(api.search_tweets, q=keyword,

tweet_mode='extended', count=10000, result_type='recent', lang='en', include_entities=True).items(10000)

# Store tweets into a dataframe for easier manipulation and analysis using pandas library in Python

df = pd.DataFrame(data=[[tweet.created_at, tweet.id,

tweet.full_text, tweet.user.screen_name, tweet.user.location, tweet.user.followers_count, tweet.user.friends_count,

tweet.user.verified, tweet.user.favourites_count, tweet.user.statuses_count, tweet.retweet_count, tweet.favorite_count, tweet.entities['hashtags'],

tweet.entities['user_mentions']] for tweet in hasilSearch], columns=['created_at', 'tweet_id', 'tweet_text', 'username', 'location', 'followers_count', 'friends_count', 'verified', 'favourites_count', 'statuses_count', 'retweet_count', 'favorite_count', 'hashtags', 'user_mentions'])

df['created_at'] =

pd.to_datetime(df['created_at']).dt.tz_localize(None)

# Export dataframe into a CSV and Excel file with date name

df.to_csv('singapore_airlines_tweets_{}.csv'.format(date.today()), index=False)

df.to_excel('singapore_airlines_tweets_{}.xlsx'.format(date.today()) , index=False)

df.tail()

Kode Praproses Data

# preprocessing data import nltk

from nltk.corpus import stopwords

from nltk.tokenize import word_tokenize from nltk.stem import WordNetLemmatizer from nltk import pos_tag

from nltk.corpus import wordnet import re

import pandas as pd import string

import contractions import emoji

# download stopwords

nltk.download('stopwords')

# download punkt

nltk.download('punkt')

# download wordnet

nltk.download('wordnet')

# download pos_tag

nltk.download('averaged_perceptron_tagger')

# preprocessing function (casefolding, handle emoji, handle contractions, Cleaning, StopWord Removal, Lemmatization) def preprocess_text(text):

# Casefolding (lowercase) text = text.lower()

# Cleaning 1

# normalize text (remove repeated characters) text = re.sub(r'(\w)\1{2,}', r'\1', text) # remove retweet text

text = re.sub(r'^rt[\s]+', '', text)

# remove hyperlink text (http or https)

text = re.sub(r'https?:\/\/.*[\r\n]*', '', text) # remove www url

text = re.sub(r'www\.[^ ]+', '', text) # remove hashtag text

text = re.sub(r'#', '', text)

# remove mention username with number and underscore text = re.sub(r'@[a-zA-Z0-9_]+', '', text)

# represent emoji

# handle ascii emoji (convert ascii emoji to text) text = re.sub(r':\(', ' sad ', text)

text = re.sub(r':\)', ' smile ', text) text = re.sub(r':d', ' smile ', text) text = re.sub(r':p', ' smile ', text) text = re.sub(r':o', ' shock ', text) text = re.sub(r':\*', ' kiss ', text) text = re.sub(r':\^\)', ' kiss ', text) text = re.sub(r':\^\(', ' cry ', text) text = re.sub(r't_t', ' cry ', text) text = re.sub(r'o_o', ' shock ', text) text = re.sub(r'x_x', ' paralyze ', text) text = re.sub(r'<3', ' love ', text) text = re.sub(r'e>', ' love ', text) text = re.sub(r'xd', ' laugh ', text) text = re.sub(r'^_^', ' smile ', text) text = re.sub(r'x\)', ' smile ', text) text = re.sub(r'x\(', ' sad ', text) text = re.sub(r'xo', ' shock ', text) text = re.sub(r':v', ' neutral ', text) text = re.sub(r':3', ' smile ', text) text = re.sub(r'8\)', ' smile ', text) text = re.sub(r'8\(', ' sad ', text) text = re.sub(r'8/', ' neutral ', text) text = re.sub(r'8o', ' shock ', text) text = re.sub(r'8d', ' smile ', text) text = re.sub(r'8p', ' smile ', text)

# handle emoji (convert unicode emoji to full text) text = emoji.demojize(text, delimiters=(' ', ' ', ' '), language='alias')

# handle emoji text (representation unicode emoji text to the meaning of emoji)

unicode_emoji = {

"white heart" : "love",

"black heart" : "love", "blue heart" : "love", "green heart" : "love", "yellow heart" : "love", "purple heart" : "love", "brown heart" : "love", "orange heart" : "love", "red heart" : "love", "mending heart" : "love", "heart on fire" : "love", "broken heart" : "sad", "heart exclamation" : "sad", "heart decoration" : "love", "two hearts" : "love",

"revolving hearts" : "love", "beating heart" : "love", "growing heart" : "love", "sparkling heart" : "love", "heart with ribbon" : "love", "heart with arrow" : "love", "love letter" : "love", "kiss mark" : "love", "folded hands" : "pray", "handshake" : "agree", "open hands" : "agree", "heart hands" : "love",

"raising hands" : "celebrate", "clapping hands" : "celebrate", "thumbs up" : "agree",

"thumbs down" : "disagree", "index pointing up" : "agree",

"backhand index pointing up" : "agree", "middle finger" : "disagree",

"call me hand" : "agree", "love you gesture" : "love", "crossed fingers" : "hope", "victory hand" : "agree", "OK hand" : "agree",

"vulcan salute" : "agree", "raised hand" : "agree",

"hand with fingers splayed" : "agree", "raised back of hand" : "agree",

"waving hand" : "agree", "pile of poo" : "disgust",

"skull and crossbones" : "disgust",

"skull" : "disgust",

"angry face with horns" : "angry", "smiling face with horns" : "evil", "face with symbols on mouth" : "angry", "angry face" : "angry",

"pouting face" : "angry",

"face with steam from nose" : "angry", "yawning face" : "tired",

"tired face" : "tired", "weary face" : "tired",

"downcast face with sweat" : "tired", "disappointed face" : "sad",

"persevering face" : "tired", "confounded face" : "sad",

"face screaming in fear" : "shock", "loudly crying face" : "cry",

"crying face" : "cry",

"sad but relieved face" : "sad", "anxious face with sweat" : "sad", "fearful face" : "shock",

"anguished face" : "sad",

"frowning face with open mouth" : "sad", "pleading face" : "sad",

"flushed face" : "shock", "astonished face" : "shock", "hushey face" : "shock",

"face with open mouth" : "shock", "frowning face" : "sad",

"slightly frowning face" : "sad", "worried face" : "sad",

"confused face" : "concern", "face with monocle" : "concern", "nerd face" : "concern",

"smiling face with sunglasses" : "happy", "partying face" : "happy",

"cowboy hat face" : "happy", "exploding head" : "shock",

"face with spiral eyes" : "shock", "face with crossed out eyes" : "shock", "woozy face" : "shock",

"cold face" : "shock", "hot face" : "shock", "sneezing face" : "sick", "face vomiting" : "sick",

"face with medical mask" : "sick",

"nauseated face" : "sick",

"face with head bandage" : "sick", "face with thermometer" : "sick", "sleeping face" : "sleep",

"drooling face" : "sleep", "sleepy face" : "sleep", "pensive face" : "sad", "relieved face" : "happy", "lying face" : "sad", "face exhaling" : "happy", "grimacing face" : "sad",

"face with rolling eyes" : "sad", "unamused face" : "sad",

"smirking face" : "happy", "face in clouds" : "happy", "face without mouth" : "sad", "expressionless face" : "sad", "neutral face" : "neutral",

"face with raised eyebrow" : "neutral", "zipper mouth face" : "neutral",

"thinking face" : "neutral", "shushing face" : "neutral",

"face with peeking eye" : "neutral",

"face with open eyes and hand over mouth" : "neutral", "face with hand over mouth" : "neutral",

"smiliing face with open hands" : "happy", "money mouth face" : "happy",

"squinting face with tongue" : "happy", "zany face" : "happy",

"winking face with tongue" : "happy", "face with tongue" : "happy",

"face savouring food" : "happy", "smiling face with tear" : "happy",

"kissing face with smiling eyes" : "love", "kissing face with closed eyes" : "love", "smiling face with halo" : "love",

"smiling face" : "happy", "kissing face" : "love",

"face blowing a kiss" : "love", "star struck" : "love",

"smiling face with heart eyes" : "love", "smiling face with hearts" : "love", "winking face" : "happy",

"melting face" : "love", "upside down face" : "happy",

"slightly smiling face" : "happy", "face with tears of joy" : "happy",

"rolling on the floor laughing" : "happy", "grinning face with sweat" : "happy", "grinning squinting face" : "happy", "grinning face" : "happy",

"beaming face with smiling eyes" : "happy", "grinning face with big eyes" : "happy", "grinning face with smiling eyes" : "happy", }

text = re.sub(r':[a-z_]+:', lambda x: unicode_emoji[x.group()], text)

# remove whitespace

text = re.sub(r'\s+', ' ', text) # represent contractions

# handle slang words (convert slang words to full form) slang_dictionary = {

"lol": "laugh out loud", "brb": "be right back", "idk": "I do not know", "omg": "oh my god", "btw": "by the way",

"ttyl": "talk to you later", "np": "no problem",

"yw": "you are welcome", "thx": "thanks",

"u": "you", "r": "are", "ur": "your", "y": "why",

"ty": "thank you", "b4": "before", "sry": "sorry", "nvm": "nevermind", "im": "I am",

"imo": "in my opinion", "rn": "right now", "idc": "I do not care", "ily": "I love you", "irl": "in real life",

"afk": "away from keyboard", "tho": "though",

"thru": "through",

"thruout": "throughout",

"q": "queue", "w": "with", "w/o": "without", "w/": "with", "w/o": "without", "e.g.": "for example", "i.e.": "that is", "w/e": "whatever", "w/ever": "whatever", "w/out": "without", "wbu": "what about you", "wyd": "what are you doing", "wya": "where are you at", "tbt": "throwback thursday", "tba": "to be announced", "tbc": "to be continued", "tbd": "to be determined", "yolo": "you only live once", "ftw": "for the win",

"ftl": "for the loss", "omw": "on my way", "ppl": "people", "bc": "because", "b/c": "because", "b4n": "bye for now", "bfn": "bye for now",

"bff": "best friends forever", "bffl": "best friends for life", "bday": "birthday",

"b-day": "birthday", "bday": "birthday", "cuz": "because", "cos": "because", "cus": "because", "fam": "family", "fav": "favorite", "fave": "favorite", "bcoz": "because", "bcos": "because", "bcuz": "because", "l" : "lose",

"oof" : "discomfort", "sheesh" : "damn", "turnt" : "excited", "bet" : "agree",

"p" : "positive",

"mid" : "not that good", "cap" : "lying",

"finna" : "going to",

"yeet" : "throwing something away", "sus" : "suspicious",

"slay" : "do something well", "stan" : "obsess over",

"touch grass" : "go outside", "simping" : "crushing",

"iykyk" : "if you know you know", "rizzle" : "really",

"rizzing" : "flirting", "rizz" : "flirt",

"savage" : "cool"

}

# handle contractions (convert contractions to full form) text = contractions.fix(text)

# tokenize text

text = word_tokenize(text) # replace slang words

text = [slang_dictionary[word] if word in slang_dictionary else word for word in text]

# join text

text = ' '.join(text) # Cleaning 2

# remove special characters

text = re.sub(r'[^a-zA-Z]', ' ', text) # remove punctuation

text = re.sub(r'[^\w\s]', '', text) # remove whitespace

text = re.sub(r'\s+', ' ', text) # StopWord Removal

# Tokenize

text = word_tokenize(text) # StopWord Removal

stop_words = set(stopwords.words('english'))

text = [word for word in text if word not in stop_words]

# Joining

text = ' '.join(text) # remove whitespace again

text = re.sub(r'\s+', ' ', text)

# POS Tagging # Tokenize

text = word_tokenize(text)

text = pos_tag(text, lang='eng') def get_wordnet_pos(pos_tag):

if pos_tag.startswith('J'):

return wordnet.ADJ

elif pos_tag.startswith('V'):

return wordnet.VERB

elif pos_tag.startswith('N'):

return wordnet.NOUN

elif pos_tag.startswith('R'):

return wordnet.ADV else:

return wordnet.NOUN

text = [(word, get_wordnet_pos(pos_tag)) for (word, pos_tag) in text]

# Lemmatization

lemmatizer = WordNetLemmatizer()

text = [lemmatizer.lemmatize(word, tag) for word, tag in text]

# Joining

text = ' '.join(text) # remove single character

text = re.sub(r'\s+[a-zA-Z]\s+', ' ', text) # remove whitespace again

text = re.sub(r'\s+', ' ', text) return text

df = pd.read_excel("singapore_airlines_tweets_full.xlsx") df["processed_text"] = df["tweet"].apply(preprocess_text) df.to_csv("singapore_airlines_processed.csv", index=False) df.to_excel("singapore_airlines_processed.xlsx", index=False) df.tail()

Kode Pelabelan Sentimen

# sentiment analysis (using vader)

from nltk.sentiment.vader import SentimentIntensityAnalyzer import matplotlib.pyplot as plt

import pandas as pd

# create sentiment analyzer

sentiment_analyzer = SentimentIntensityAnalyzer()

# create sentiment score sentiment_score = []

for text in df["processed_text"]:

score = sentiment_analyzer.polarity_scores(text) sentiment_score.append(score["compound"])

# create sentiment label (very negative, negative, neutral, positive, very positive)

sentiment_label = []

for score in sentiment_score:

if score <= -0.6:

sentiment_label.append("very negative") elif score <= -0.2:

sentiment_label.append("negative") elif score <= 0.2:

sentiment_label.append("neutral") elif score <= 0.6:

sentiment_label.append("positive") else:

sentiment_label.append("very positive")

# create plot

plt.figure(figsize=(20, 10)) plt.subplot(1, 2, 1)

plt.hist(sentiment_score, bins=20) plt.title("Sentiment Score")

plt.xlabel("Score") plt.ylabel("Frequency") plt.subplot(1, 2, 2)

plt.hist(sentiment_label, bins=5) plt.title("Sentiment Label") plt.xlabel("Label")

plt.ylabel("Frequency")

plt.savefig("sentiment_analysis.png") plt.tight_layout()

plt.show()

# create dataframe

df["sentiment_score"] = sentiment_score df["sentiment_label"] = sentiment_label

df.to_csv("singapore_airlines_sentiment.csv", index=False) df.to_excel("singapore_airlines_sentiment.xlsx", index=False) df.tail()

Kode Pemodelan Klasifikasi Decision Tree

# LSA + TF-IDF as feature extraction for sentiment analysis (using DT)

import numpy as np import pandas as pd

from sklearn.pipeline import Pipeline

from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.decomposition import TruncatedSVD

from sklearn.model_selection import train_test_split from sklearn.tree import DecisionTreeClassifier

from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# read data

df = pd.read_excel("singapore_airlines_sentiment.xlsx")

# unicode to string

df["processed_text"] = df["processed_text"].astype(str)

# split data 80% training and 20% testing X = df["processed_text"]

y = df["sentiment_label"]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# create pipeline pipeline = Pipeline([

("tfidf", TfidfVectorizer(max_df=0.5, min_df=0.0, ngram_range=(1, 1))),

("lsa", TruncatedSVD(n_components=100)),

("dt", DecisionTreeClassifier(criterion="entropy", max_features="sqrt"))

])

# train model

pipeline.fit(X_train, y_train)

# predict test data

y_pred = pipeline.predict(X_test)

# print accuracy score, classification report, and confusion matrix print("Accuracy score: %0.4f" % accuracy_score(y_test, y_pred)) print("Classification report:")

print(classification_report(y_test, y_pred, digits=4)) print("Confusion matrix:")

print(confusion_matrix(y_test, y_pred))

# save model import pickle

with open("sentiment_model_dt_lsa.pkl", "wb") as file:

pickle.dump(pipeline, file)

# load model

with open("sentiment_model_dt_lsa.pkl", "rb") as file:

sentiment_model = pickle.load(file)

Kode Pemodelan Klasifikasi Logistic Regression

# LSA + TF-IDF as feature extraction for sentiment analysis (using LR)