BAB 6
Berdasarkan hasil dan pembahasan yang sudah diberikan penulis sebelumnya, berikut beberapa saran yang dapat diberikan penulis untuk penelitian selanjutnya:
1. Pada penelitian ini, penulis menggunakan 100 komponen fitur yang dipertahankan pada ekstraksi fitur LSA (TruncatedSVD) untuk pembuatan model berdasarkan rekomendasi yang disarankan.
Diharapkan pada penelitian berikutnya dilakukan pencarian jumlah komponen fitur untuk mendapatkan keakuratan dari jumlah fitur yang berbeda dan memilih yang terbaik di antara jumlah fitur tersebut pada model klasifikasi Machine Learning dengan cara analisis komponen utama (PCA) atau metode validasi siluet (silhouette validation).
2. Pada penelitian ini, penulis hanya menggunakan beberapa emoji ASCII dan emoji Unicode sebagai representasi emoji. Diharapkan dalam penelitian selanjutnya perbanyak emoji yang dapat direpresentasikan.
3. Untuk representasi kontraksi, terutama kata-kata slang dalam Bahasa Inggris yang sedang tren, penulis harapkan pada penelitian selanjutnya juga diperbanyak kamus kata slang atau kata tren terbaru untuk untuk direpresentasikan menjadi makna yang lengkap dan benar.
4. Selain menghasilkan label sentimen pada analisis teks yang diinput, implementasi dalam penelitian ini juga memberitahukan tentang aspek terkait dengan pembahasan pada teks. Aspek yang digunakan disini ditentukan pada kamus aspek yang diambil referensi dari situs SKYTRAX dan juga beberapa melalui pencarian pada sampel data yang dilakukan penulis secara manual. Maka dari itu, untuk penelitian selanjutnya diharapkan dapat memperbanyak aspek yang ditentukan sebagai kemungkinan aspek yang terkait dengan data teks.
DAFTAR PUSTAKA
Agarwal, B., Nayak, R., Mittal, N., Srikanta, ·, & Editors, P. (2020). Deep Learning-Based Approaches for Sentiment Analysis.
http://www.springer.com/series/16171
Ahuja, R., Chug, A., Kohli, S., Gupta, S., & Ahuja, P. (2019). The impact of features extraction on the sentiment analysis. Procedia Computer Science, 152, 341–348. https://doi.org/10.1016/j.procs.2019.05.008
Anandarajan, M., Hill, C., & Nolan, T. (2019). Text Preprocessing (pp. 45–59).
https://doi.org/10.1007/978-3-319-95663-3_4
Arafah, B., & Hasyim, M. (2019). The Language of Emoji in Social Media. KnE Social Sciences. https://doi.org/10.18502/kss.v3i19.4880
Avinash, M., & Sivasankar, E. (2019). A study of feature extraction techniques for sentiment analysis. Advances in Intelligent Systems and Computing, 814, 475–486. https://doi.org/10.1007/978-981-13-1501-5_41
Bayhaqy, A., Sfenrianto, S., Nainggolan, K., & Kaburuan, E. R. (2018). Sentiment analysis about E-commerce from tweets using decision tree, K-nearest neighbor, and naïve bayes. In 2018 international conference on orange technologies (ICOT). http://dlvr.it/Qb83n8pic.twitter.com/8MucIMhUMO, Chaudhri, A. A., Saranya, S. S., & Dubey, S. (2021). Implementation Paper on
Analyzing COVID-19 Vaccines on Twitter Dataset Using Tweepy and Text Blob (Vol. 25). http://annalsofrscb.ro
Duei Putri, D., Nama, G. F., & Sulistiono, W. E. (2022). Analisis Sentimen Kinerja Dewan Perwakilan Rakyat (DPR) Pada Twitter Menggunakan Metode Naive Bayes Classifier. Jurnal Informatika Dan Teknik Elektro Terapan, 10(1). https://doi.org/10.23960/jitet.v10i1.2262
Guia, M., Silva, R. R., & Bernardino, J. (2019). Comparison of Naive Bayes, support vector machine, decision trees and random forest on sentiment analysis. IC3K 2019 - Proceedings of the 11th International Joint Conference on Knowledge Discovery, Knowledge Engineering and
Knowledge Management, 1, 525–531.
https://doi.org/10.5220/0008364105250531
Han, K. X., Chien, W., Chiu, C. C., & Cheng, Y. T. (2020). Application of support vector machine (SVM) in the sentiment analysis of twitter dataset.
Applied Sciences (Switzerland), 10(3). https://doi.org/10.3390/app10031125 Hasanli, H., & Rustamov, S. (2019). Sentiment Analysis of Azerbaijani twits
Using Logistic Regression, Naive Bayes and SVM.
Hussein, D. M. E. D. M. (2018). A survey on sentiment analysis challenges.
Journal of King Saud University - Engineering Sciences, 30(4), 330–338.
https://doi.org/10.1016/j.jksues.2016.04.002
Kaur, H., Ahsaan, S. U., Alankar, B., & Chang, V. (2021). A Proposed Sentiment Analysis Deep Learning Algorithm for Analyzing COVID-19 Tweets.
Information Systems Frontiers, 23(6), 1417–1429.
https://doi.org/10.1007/s10796-021-10135-7
Keerthi Kumar, H. M., & Harish, B. S. (2018). Classification of short text using various preprocessing techniques: An empirical evaluation. In Advances in Intelligent Systems and Computing (Vol. 709, pp. 19–30). Springer Verlag.
https://doi.org/10.1007/978-981-10-8633-5_3
Khairunnisa, S., Adiwijaya, A., & Faraby, S. Al. (2021a). Pengaruh Text
Preprocessing terhadap Analisis Sentimen Komentar Masyarakat pada Media Sosial Twitter (Studi Kasus Pandemi COVID-19). JURNAL MEDIA
INFORMATIKA BUDIDARMA, 5(2), 406.
https://doi.org/10.30865/mib.v5i2.2835
Khairunnisa, S., Adiwijaya, A., & Faraby, S. Al. (2021b). Pengaruh Text
Preprocessing terhadap Analisis Sentimen Komentar Masyarakat pada Media Sosial Twitter (Studi Kasus Pandemi COVID-19). JURNAL MEDIA
INFORMATIKA BUDIDARMA, 5(2), 406.
https://doi.org/10.30865/mib.v5i2.2835
Khalid, S. (2014). A Survey of Feature Selection and Feature Extraction Techniques in Machine Learning (Vol. 372). www.conference.thesai.org
Mothe, J., Son, L. H., Nguyen, T. Q. V., Đại học Đà Nẵng. University of Science and Technology, Quỹ phát triển khoa học công nghệ quốc gia (Vietnam), IEEE Vietnam Section., & Institute of Electrical and Electronics Engineers.
(2019). Proceedings of 2019 11th International Conference On Knowledge And Systems Engineering : KSE 2019 : October 24-26, 2019, Da Nang, Vietnam.
Naseem, U., Razzak, I., & Eklund, P. W. (2021). A survey of pre-processing techniques to improve short-text quality: a case study on hate speech
detection on twitter. Multimedia Tools and Applications, 80(28–29), 35239–
35266. https://doi.org/10.1007/s11042-020-10082-6
Orkphol, K., & Yang, W. (2019). Sentiment Analysis on Microblogging with K- Means Clustering and Artificial Bee Colony. International Journal of Computational Intelligence and Applications, 18(3).
https://doi.org/10.1142/S1469026819500172
Pasek, P., Mahawardana, O., Sasmita, G. A., Agus, P., & Pratama, E. (2022).
Analisis Sentimen Berdasarkan Opini dari Media Sosial Twitter terhadap
“Figure Pemimpin” Menggunakan Python. In JITTER-Jurnal Ilmiah Teknologi dan Komputer (Vol. 3, Issue 1).
Poornima. A, & K. Sathiya Priya. (2020). 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS).
Prakash Pokharel, B. (2020). Twitter Sentiment analysis during COVID-19 Outbreak in Nepal. https://ssrn.com/abstract=3624719
Rhanoui, M., Mikram, M., Yousfi, S., & Barzali, S. (2019). A CNN-BiLSTM Model for Document-Level Sentiment Analysis. Machine Learning and Knowledge Extraction, 1(3), 832–847. https://doi.org/10.3390/make1030048 Safitri, R. (2018). Jurnal Tibanndaru Volume 2 Nomor 2, Oktober (Vol. 2, Issue
2).
Sailunaz, K., & Alhajj, R. (2019). Emotion and sentiment analysis from Twitter text. Journal of Computational Science, 36.
https://doi.org/10.1016/j.jocs.2019.05.009
Sari, I. P., Jannah, A., Meuraxa, A. M., Syahfitri, A., & Omar, R. (2022).
Perancangan Sistem Informasi Penginputan Database Mahasiswa Berbasis Web. Hello World Jurnal Ilmu Komputer, 1(2), 106–110.
https://doi.org/10.56211/helloworld.v1i2.57
Satuluri Vanaja, & Meena Belwal. (2018). Proceedings of the International Conference on Inventive Research in Computing Applications (ICIRCA 2018) : date: July 11-12, 2018.
Singapore Airlines is the World’s Best Airline at 2023 World Airline Awards.
(2023, June 20). https://skytraxratings.com/singapore-airlines-is-named-the- worlds-best-airline-at-the-2023-world-airline-awards
LAMPIRAN
Kode Pengumpulan Data
import tweepy
consumer_key = 'CEr9DvFVDa9avDRwpCgAdljwf' consumer_secret =
'kIKaY8BJTegNVnI2vuXGuXYgboIA4CZteYgYsLc3SXZJuiqHl8'
access_token = '1400872399524876292-tR11xRNzLj0P0xI2hR4nuBOf2zIeKJ' access_token_secret =
'jQEbiy3VJOXFVW3TWmvGSL3QSkxjGWVpO7BCUa1ZuIRDU'
# Authenticate
auth = tweepy.OAuthHandler(consumer_key, consumer_secret) auth.set_access_token(access_token, access_token_secret)
# Create API object api = tweepy.API(auth) import pandas as pd import tweepy
from datetime import date
# Search for tweets based on a keyword
keyword = '"Singapore Airlines" OR @SingaporeAir OR "Singapore Airline" OR "singapore airlines" OR "singapore airline" OR
@singaporeair OR #FlySQ OR #flysq'
hasilSearch = tweepy.Cursor(api.search_tweets, q=keyword,
tweet_mode='extended', count=10000, result_type='recent', lang='en', include_entities=True).items(10000)
# Store tweets into a dataframe for easier manipulation and analysis using pandas library in Python
df = pd.DataFrame(data=[[tweet.created_at, tweet.id,
tweet.full_text, tweet.user.screen_name, tweet.user.location, tweet.user.followers_count, tweet.user.friends_count,
tweet.user.verified, tweet.user.favourites_count, tweet.user.statuses_count, tweet.retweet_count, tweet.favorite_count, tweet.entities['hashtags'],
tweet.entities['user_mentions']] for tweet in hasilSearch], columns=['created_at', 'tweet_id', 'tweet_text', 'username', 'location', 'followers_count', 'friends_count', 'verified', 'favourites_count', 'statuses_count', 'retweet_count', 'favorite_count', 'hashtags', 'user_mentions'])
df['created_at'] =
pd.to_datetime(df['created_at']).dt.tz_localize(None)
# Export dataframe into a CSV and Excel file with date name
df.to_csv('singapore_airlines_tweets_{}.csv'.format(date.today()), index=False)
df.to_excel('singapore_airlines_tweets_{}.xlsx'.format(date.today()) , index=False)
df.tail()
Kode Praproses Data
# preprocessing data import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize from nltk.stem import WordNetLemmatizer from nltk import pos_tag
from nltk.corpus import wordnet import re
import pandas as pd import string
import contractions import emoji
# download stopwords
nltk.download('stopwords')
# download punkt
nltk.download('punkt')
# download wordnet
nltk.download('wordnet')
# download pos_tag
nltk.download('averaged_perceptron_tagger')
# preprocessing function (casefolding, handle emoji, handle contractions, Cleaning, StopWord Removal, Lemmatization) def preprocess_text(text):
# Casefolding (lowercase) text = text.lower()
# Cleaning 1
# normalize text (remove repeated characters) text = re.sub(r'(\w)\1{2,}', r'\1', text) # remove retweet text
text = re.sub(r'^rt[\s]+', '', text)
# remove hyperlink text (http or https)
text = re.sub(r'https?:\/\/.*[\r\n]*', '', text) # remove www url
text = re.sub(r'www\.[^ ]+', '', text) # remove hashtag text
text = re.sub(r'#', '', text)
# remove mention username with number and underscore text = re.sub(r'@[a-zA-Z0-9_]+', '', text)
# represent emoji
# handle ascii emoji (convert ascii emoji to text) text = re.sub(r':\(', ' sad ', text)
text = re.sub(r':\)', ' smile ', text) text = re.sub(r':d', ' smile ', text) text = re.sub(r':p', ' smile ', text) text = re.sub(r':o', ' shock ', text) text = re.sub(r':\*', ' kiss ', text) text = re.sub(r':\^\)', ' kiss ', text) text = re.sub(r':\^\(', ' cry ', text) text = re.sub(r't_t', ' cry ', text) text = re.sub(r'o_o', ' shock ', text) text = re.sub(r'x_x', ' paralyze ', text) text = re.sub(r'<3', ' love ', text) text = re.sub(r'e>', ' love ', text) text = re.sub(r'xd', ' laugh ', text) text = re.sub(r'^_^', ' smile ', text) text = re.sub(r'x\)', ' smile ', text) text = re.sub(r'x\(', ' sad ', text) text = re.sub(r'xo', ' shock ', text) text = re.sub(r':v', ' neutral ', text) text = re.sub(r':3', ' smile ', text) text = re.sub(r'8\)', ' smile ', text) text = re.sub(r'8\(', ' sad ', text) text = re.sub(r'8/', ' neutral ', text) text = re.sub(r'8o', ' shock ', text) text = re.sub(r'8d', ' smile ', text) text = re.sub(r'8p', ' smile ', text)
# handle emoji (convert unicode emoji to full text) text = emoji.demojize(text, delimiters=(' ', ' ', ' '), language='alias')
# handle emoji text (representation unicode emoji text to the meaning of emoji)
unicode_emoji = {
"white heart" : "love",
"black heart" : "love", "blue heart" : "love", "green heart" : "love", "yellow heart" : "love", "purple heart" : "love", "brown heart" : "love", "orange heart" : "love", "red heart" : "love", "mending heart" : "love", "heart on fire" : "love", "broken heart" : "sad", "heart exclamation" : "sad", "heart decoration" : "love", "two hearts" : "love",
"revolving hearts" : "love", "beating heart" : "love", "growing heart" : "love", "sparkling heart" : "love", "heart with ribbon" : "love", "heart with arrow" : "love", "love letter" : "love", "kiss mark" : "love", "folded hands" : "pray", "handshake" : "agree", "open hands" : "agree", "heart hands" : "love",
"raising hands" : "celebrate", "clapping hands" : "celebrate", "thumbs up" : "agree",
"thumbs down" : "disagree", "index pointing up" : "agree",
"backhand index pointing up" : "agree", "middle finger" : "disagree",
"call me hand" : "agree", "love you gesture" : "love", "crossed fingers" : "hope", "victory hand" : "agree", "OK hand" : "agree",
"vulcan salute" : "agree", "raised hand" : "agree",
"hand with fingers splayed" : "agree", "raised back of hand" : "agree",
"waving hand" : "agree", "pile of poo" : "disgust",
"skull and crossbones" : "disgust",
"skull" : "disgust",
"angry face with horns" : "angry", "smiling face with horns" : "evil", "face with symbols on mouth" : "angry", "angry face" : "angry",
"pouting face" : "angry",
"face with steam from nose" : "angry", "yawning face" : "tired",
"tired face" : "tired", "weary face" : "tired",
"downcast face with sweat" : "tired", "disappointed face" : "sad",
"persevering face" : "tired", "confounded face" : "sad",
"face screaming in fear" : "shock", "loudly crying face" : "cry",
"crying face" : "cry",
"sad but relieved face" : "sad", "anxious face with sweat" : "sad", "fearful face" : "shock",
"anguished face" : "sad",
"frowning face with open mouth" : "sad", "pleading face" : "sad",
"flushed face" : "shock", "astonished face" : "shock", "hushey face" : "shock",
"face with open mouth" : "shock", "frowning face" : "sad",
"slightly frowning face" : "sad", "worried face" : "sad",
"confused face" : "concern", "face with monocle" : "concern", "nerd face" : "concern",
"smiling face with sunglasses" : "happy", "partying face" : "happy",
"cowboy hat face" : "happy", "exploding head" : "shock",
"face with spiral eyes" : "shock", "face with crossed out eyes" : "shock", "woozy face" : "shock",
"cold face" : "shock", "hot face" : "shock", "sneezing face" : "sick", "face vomiting" : "sick",
"face with medical mask" : "sick",
"nauseated face" : "sick",
"face with head bandage" : "sick", "face with thermometer" : "sick", "sleeping face" : "sleep",
"drooling face" : "sleep", "sleepy face" : "sleep", "pensive face" : "sad", "relieved face" : "happy", "lying face" : "sad", "face exhaling" : "happy", "grimacing face" : "sad",
"face with rolling eyes" : "sad", "unamused face" : "sad",
"smirking face" : "happy", "face in clouds" : "happy", "face without mouth" : "sad", "expressionless face" : "sad", "neutral face" : "neutral",
"face with raised eyebrow" : "neutral", "zipper mouth face" : "neutral",
"thinking face" : "neutral", "shushing face" : "neutral",
"face with peeking eye" : "neutral",
"face with open eyes and hand over mouth" : "neutral", "face with hand over mouth" : "neutral",
"smiliing face with open hands" : "happy", "money mouth face" : "happy",
"squinting face with tongue" : "happy", "zany face" : "happy",
"winking face with tongue" : "happy", "face with tongue" : "happy",
"face savouring food" : "happy", "smiling face with tear" : "happy",
"kissing face with smiling eyes" : "love", "kissing face with closed eyes" : "love", "smiling face with halo" : "love",
"smiling face" : "happy", "kissing face" : "love",
"face blowing a kiss" : "love", "star struck" : "love",
"smiling face with heart eyes" : "love", "smiling face with hearts" : "love", "winking face" : "happy",
"melting face" : "love", "upside down face" : "happy",
"slightly smiling face" : "happy", "face with tears of joy" : "happy",
"rolling on the floor laughing" : "happy", "grinning face with sweat" : "happy", "grinning squinting face" : "happy", "grinning face" : "happy",
"beaming face with smiling eyes" : "happy", "grinning face with big eyes" : "happy", "grinning face with smiling eyes" : "happy", }
text = re.sub(r':[a-z_]+:', lambda x: unicode_emoji[x.group()], text)
# remove whitespace
text = re.sub(r'\s+', ' ', text) # represent contractions
# handle slang words (convert slang words to full form) slang_dictionary = {
"lol": "laugh out loud", "brb": "be right back", "idk": "I do not know", "omg": "oh my god", "btw": "by the way",
"ttyl": "talk to you later", "np": "no problem",
"yw": "you are welcome", "thx": "thanks",
"u": "you", "r": "are", "ur": "your", "y": "why",
"ty": "thank you", "b4": "before", "sry": "sorry", "nvm": "nevermind", "im": "I am",
"imo": "in my opinion", "rn": "right now", "idc": "I do not care", "ily": "I love you", "irl": "in real life",
"afk": "away from keyboard", "tho": "though",
"thru": "through",
"thruout": "throughout",
"q": "queue", "w": "with", "w/o": "without", "w/": "with", "w/o": "without", "e.g.": "for example", "i.e.": "that is", "w/e": "whatever", "w/ever": "whatever", "w/out": "without", "wbu": "what about you", "wyd": "what are you doing", "wya": "where are you at", "tbt": "throwback thursday", "tba": "to be announced", "tbc": "to be continued", "tbd": "to be determined", "yolo": "you only live once", "ftw": "for the win",
"ftl": "for the loss", "omw": "on my way", "ppl": "people", "bc": "because", "b/c": "because", "b4n": "bye for now", "bfn": "bye for now",
"bff": "best friends forever", "bffl": "best friends for life", "bday": "birthday",
"b-day": "birthday", "bday": "birthday", "cuz": "because", "cos": "because", "cus": "because", "fam": "family", "fav": "favorite", "fave": "favorite", "bcoz": "because", "bcos": "because", "bcuz": "because", "l" : "lose",
"oof" : "discomfort", "sheesh" : "damn", "turnt" : "excited", "bet" : "agree",
"p" : "positive",
"mid" : "not that good", "cap" : "lying",
"finna" : "going to",
"yeet" : "throwing something away", "sus" : "suspicious",
"slay" : "do something well", "stan" : "obsess over",
"touch grass" : "go outside", "simping" : "crushing",
"iykyk" : "if you know you know", "rizzle" : "really",
"rizzing" : "flirting", "rizz" : "flirt",
"savage" : "cool"
}
# handle contractions (convert contractions to full form) text = contractions.fix(text)
# tokenize text
text = word_tokenize(text) # replace slang words
text = [slang_dictionary[word] if word in slang_dictionary else word for word in text]
# join text
text = ' '.join(text) # Cleaning 2
# remove special characters
text = re.sub(r'[^a-zA-Z]', ' ', text) # remove punctuation
text = re.sub(r'[^\w\s]', '', text) # remove whitespace
text = re.sub(r'\s+', ' ', text) # StopWord Removal
# Tokenize
text = word_tokenize(text) # StopWord Removal
stop_words = set(stopwords.words('english'))
text = [word for word in text if word not in stop_words]
# Joining
text = ' '.join(text) # remove whitespace again
text = re.sub(r'\s+', ' ', text)
# POS Tagging # Tokenize
text = word_tokenize(text)
text = pos_tag(text, lang='eng') def get_wordnet_pos(pos_tag):
if pos_tag.startswith('J'):
return wordnet.ADJ
elif pos_tag.startswith('V'):
return wordnet.VERB
elif pos_tag.startswith('N'):
return wordnet.NOUN
elif pos_tag.startswith('R'):
return wordnet.ADV else:
return wordnet.NOUN
text = [(word, get_wordnet_pos(pos_tag)) for (word, pos_tag) in text]
# Lemmatization
lemmatizer = WordNetLemmatizer()
text = [lemmatizer.lemmatize(word, tag) for word, tag in text]
# Joining
text = ' '.join(text) # remove single character
text = re.sub(r'\s+[a-zA-Z]\s+', ' ', text) # remove whitespace again
text = re.sub(r'\s+', ' ', text) return text
df = pd.read_excel("singapore_airlines_tweets_full.xlsx") df["processed_text"] = df["tweet"].apply(preprocess_text) df.to_csv("singapore_airlines_processed.csv", index=False) df.to_excel("singapore_airlines_processed.xlsx", index=False) df.tail()
Kode Pelabelan Sentimen
# sentiment analysis (using vader)
from nltk.sentiment.vader import SentimentIntensityAnalyzer import matplotlib.pyplot as plt
import pandas as pd
# create sentiment analyzer
sentiment_analyzer = SentimentIntensityAnalyzer()
# create sentiment score sentiment_score = []
for text in df["processed_text"]:
score = sentiment_analyzer.polarity_scores(text) sentiment_score.append(score["compound"])
# create sentiment label (very negative, negative, neutral, positive, very positive)
sentiment_label = []
for score in sentiment_score:
if score <= -0.6:
sentiment_label.append("very negative") elif score <= -0.2:
sentiment_label.append("negative") elif score <= 0.2:
sentiment_label.append("neutral") elif score <= 0.6:
sentiment_label.append("positive") else:
sentiment_label.append("very positive")
# create plot
plt.figure(figsize=(20, 10)) plt.subplot(1, 2, 1)
plt.hist(sentiment_score, bins=20) plt.title("Sentiment Score")
plt.xlabel("Score") plt.ylabel("Frequency") plt.subplot(1, 2, 2)
plt.hist(sentiment_label, bins=5) plt.title("Sentiment Label") plt.xlabel("Label")
plt.ylabel("Frequency")
plt.savefig("sentiment_analysis.png") plt.tight_layout()
plt.show()
# create dataframe
df["sentiment_score"] = sentiment_score df["sentiment_label"] = sentiment_label
df.to_csv("singapore_airlines_sentiment.csv", index=False) df.to_excel("singapore_airlines_sentiment.xlsx", index=False) df.tail()
Kode Pemodelan Klasifikasi Decision Tree
# LSA + TF-IDF as feature extraction for sentiment analysis (using DT)
import numpy as np import pandas as pd
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.decomposition import TruncatedSVD
from sklearn.model_selection import train_test_split from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
# read data
df = pd.read_excel("singapore_airlines_sentiment.xlsx")
# unicode to string
df["processed_text"] = df["processed_text"].astype(str)
# split data 80% training and 20% testing X = df["processed_text"]
y = df["sentiment_label"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# create pipeline pipeline = Pipeline([
("tfidf", TfidfVectorizer(max_df=0.5, min_df=0.0, ngram_range=(1, 1))),
("lsa", TruncatedSVD(n_components=100)),
("dt", DecisionTreeClassifier(criterion="entropy", max_features="sqrt"))
])
# train model
pipeline.fit(X_train, y_train)
# predict test data
y_pred = pipeline.predict(X_test)
# print accuracy score, classification report, and confusion matrix print("Accuracy score: %0.4f" % accuracy_score(y_test, y_pred)) print("Classification report:")
print(classification_report(y_test, y_pred, digits=4)) print("Confusion matrix:")
print(confusion_matrix(y_test, y_pred))
# save model import pickle
with open("sentiment_model_dt_lsa.pkl", "wb") as file:
pickle.dump(pipeline, file)
# load model
with open("sentiment_model_dt_lsa.pkl", "rb") as file:
sentiment_model = pickle.load(file)
Kode Pemodelan Klasifikasi Logistic Regression
# LSA + TF-IDF as feature extraction for sentiment analysis (using LR)
import numpy as np import pandas as pd
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.decomposition import TruncatedSVD
from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
# read data
df = pd.read_excel("singapore_airlines_sentiment.xlsx")
# split data 80% training and 20% testing X = df["processed_text"].astype(str) y = df["sentiment_label"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# create pipeline pipeline = Pipeline([
("tfidf", TfidfVectorizer(max_df=0.5, min_df=0.0, ngram_range=(1, 1))),
("lsa", TruncatedSVD(n_components=100)),
("lr", LogisticRegression(penalty="l2", solver="saga")) ])