Depression Detection on Social Media Twitter Using Long Short-Term Memory

(1)

Depression Detection on Social Media Twitter Using Long Short- Term Memory

Hafshah Haudli Windjatika, Warih Maharani^*

School of Computing, Informatics Study Program, Telkom University, Bandung, Indonesia Email: ¹[email protected], ^2,*[email protected]

Correspondence Author Email: [email protected]

Abstract−Mental health problems in the world, especially in Indonesia are still significant. According to the Ministry of Health of the Republic of Indonesia stated that depression is experienced by adolescents from the age of 15 to 24 years. The depression experienced by a person is sometimes not realized by the sufferer, so social media becomes an intermediary to express feelings in text form. From the available data, this case pushes the research to detect depression disorder. Detecting depression performs to know the Twitter user who experiences depression. Data used from 159 Twitter users for every username is taken from 100 tweets. In this research, we use Word2Vec and LSTM (Long Short-Term Memory) features as the classification method. The Word2Vec works in converting data as vector and seeing the relation for every word. LSTM is chosen since the dataset is used to collect tweet from the past tense and this method be able to save the data from the past doing prediction. The classification is performed by processing the data trained such as tweeting which becomes a model for processing the data trained test. Based on the test result produce the accuracy data is 77.95% and the F1-Score is 57.14%.

Keywords: Depression Detection; Twitter; Word2Vec; LSTM (Long Short-Term Memory)

1. INTRODUCTION

Artikel Nowadays, anxiety disorder and sad feeling are not the simple case for teenagers. The last several years there was an increasing data significantly about depression cases in teenage life. Depression is mood disorder which is trigger feeling deep of sadness and provoke less interest to the things they like. This begins from there’s no hobby, feeling lonely, and even giving up. Depression disorder also makes someone unstable emotionally for instance they are irritable and upset. This case can influence daily productivity and distract social relation that is important for daily life [1].

In 2017 performed calculation of diseases in Indonesia, some of them are classified as mental disorder or depression which stands in the first grade of mental disorder rank. Depression could occur by teenagers according to Baseline Health Research in 2018, which mentions that depressed disorder could be happened between the age of 15 -24 years old with a percentage reaching 6.2% [2]. Research by Gertrud, Sjur, Tore, and Else found that depression rates were higher in adolescents after puberty [3]. University of Pennsylvania has done the research which got the result that a high level of depression is caused of using social media. Social media is a digital platform which facilitates the user in interacting or sharing their social activity. One of the most popular social media is Twitter, the users can update their positive or negative tweets. By updating the tweet depressed detection is possible to perform [4].

There are several machine learning methods which can be used such as K-Nearest Neighbor (KNN) [5] and Naïve Bayes [6]. The KNN method is the easiest method to classify by looking up the nearest value accompanying structure be inserted. High accuracy gets from perform the KNN method result the similar data grade trained [5].

Naïve Bayes is the data classification method which have a probability data will occur in the future and prefer used for analyzing some company [6]. The previous approach mention is having several weaknesses, there is sensitive approach of KNN method where the anomaly data need to determine k value (total of nearest neighbor) which can not determine by mathematic calculation. Next, Naïve Bayes assumes the independent variable no need the conditional probability to zero [7].

According to research above, the research of depression detection we use the approach of Long Short-Term Memory (LSTM). The LSTM use the type of Recurrent Neural Network (RNN) usually used for any matters concern deep learning. The simplicity to address long-term-dependence on the input, within the lowest mistake level [8]. Besides the LSTM approach, this research is also use Word2Vec as the extraction feature. The Word2Vec is one of the Word Embedding or the convert phase from word to vector since the deep learning is basically only read the input number or vector until is needed the Word2Vec model [9].

The measurement tools is used for calculate the depression degree use the DASS-42 model or Depression, Anxiety, and Stress Scales which are having 42 symptoms to differentiate the symptoms from each disruption [10].

As research conducted by Hutomo in knowing the psychological impact during the COVID-19 pandemic using DASS-42 [11]. The questionnaire distributed contains a scale of symptoms of stress, anxiety, and symptoms of depression which will be explained further in the data collection section.

Based on the explanation above, this research is aimed to detect the depression in social media Twitter by detect the tweet and give recommendation if the method succeeded to process the data in a few amounts. By use this few data we know that the model works successfully by seeing the accuracy level. This research is only focus on detect tweet in Bahasa Indonesia, for another language such as English and local language will be translated in line with dictionary used. This research can be used by the company especially in Indonesia for recruitment

(2)

process. The result of the research can be used as benchmark for new employee whether they experience depression disorder or not.

2. RESEARCH METHODOLOGY

2.1 System Design

This research process detecting depression can be illustrated in several stages such as on the figure 1. First step is collecting the data by spreading the questioner and crawling data from social media Twitter. The collected data will be on preprocessing step to cleanse the data from noise on the words. After passed the preprocessing step, there will be extraction feature to simplify the reading model input data. Before entering the model, the data will be performed data split to be training data and testing data. Training data is used to practice the model, meanwhile the testing data will be used for testing the model. In the last step will be performed evaluation, from evaluation process it could be identify model performance.

Figure 1. Research Method 2.2 Collecting Data

The usage of the data from this research is tweet in Bahasa Indonesia from social media twitter. Data collection through in 3 stages, such as:

1. Questionnaire distribution

Before taking the data, they spread the form based on model of Depression, Anxiety, and Stress Scales (DASS- 42) as the tools to measure the twitter users. There are 42 questions in DASS-42 which has been divided for each item number for depression symptoms, anxiety, and stress. The dividing of depression item could be seen on table 1 [10]. We use the category of depression according to research in detecting depression.

Table 1. The Item of Depression

2. Crawling Data

Twint is used in crawling data stage or collecting data based on easy usage cause it no needs to have a twitter account, no need API Twitter, and no limit. The user only need to install phyton, connected to the internet, and only need some twint tools [12]. From the results of this crawl, the original data obtained has a lot of information such as Id, date, and time the tweet was uploaded, user id, username, tweet, and others.

3. Labeling Data

After the data have been acquired, we only use 2 information consisting of username and tweet. After that, then give labelling based on the dividing item on table 1. Labelling step of depression is divided into 2, positive (1) or negative (0) [10]. The way how to determine between positive and negative through the level of serious condition of depression, 0-9 for negative and 10+ for the positive, you can see on the table 2 for labeling data.

Category Number of Item

Depression 3, 5, 10, 13, 16, 17, 21, 24, 26, 31, 34, 37, 38, 42 Anxiety 2, 4, 7, 9, 15, 19, 20, 23, 25, 28, 30, 36, 40, 41

Stress 1, 6, 8, 11, 12, 14, 18, 22, 27, 29, 32, 33, 35, 39

(3)

Table 2. The Data Sample

User Tweet Label

User 1

@markleebase https://t.co/UpfGU0ZQc8 @tanyakanrl alasannya demi bertahan hidup. udah mana yang putusin dokter

Depres i User

2

Fagh tikhtokh Ai laik nurul budi ALLUKA COWOK ANJIM OEMJE wakwau Pengen men antea😪

Depres i User

3

Kenapa fyp jadi makanan semua si? Fase bingung akan segala hal -thv https://t.co/jtxGEHPB1X

Norma l 2.3 Preprocessing

Preprocessing data is done to omit the noise. The first step is case folding, transform the writing in capital letter turn into normal letter [13]. The next step is cleansing, the text will be cleaned by erase the hashtag, URL, emoticon, and another characters. Then normalization is done to change the abbreviation words into original words also change the informal language to the formal language by using list of words in the dictionary. The next step is stopword removal, this is perform to delete the stop word or the words which do not have certain meaning [14]. In stopword removal, we use the library NLTK and manual document with default word list in the dictionary.

Stemming step is the next step which is used for erasing the additional word that use literature language. Last step is Preprocessing, this step focus on the separate text turn into smaller unit [13].

2.4 Feature Extraction

Feature extraction for this research performs by sing Word2Vec. The Word2Vec is Word Embedding works by taking corpus text as an input and transform words from corpus into vector. The corpus is used coming from unique words from the data, so it is produced vector using word2vec. In this research the word2vec is needed cause deep learning model could not process data directly into text form, so the text must be transformed into vector. Vector is also used for measure how close a vector to the other words [15]. The word2vec has two model architects, there are CBOW (Continuous Bag of Word) and Skip Gram. CBOW is an architect to predict the recent words and as the input context and also words that able to predict the output. Meanwhile, Skip Gram is current word architect as an input context to predict the classification of words according to other words in the sentence [16].

2.5 LSTM Classification

This research is using Long Short-Term Memory (LSTM) as the deep learning method which is part of algorithm development Recurrent Neural Network (RNN). This development is done to complete the weakness toward RNN which unable to predict past information words with time series as the form from data classification. LSTM has the ability of process data in the large scale since this model has a complete function. The classification step of LSTM is built for practice the model using data such as vector is done in the previous step. The vector which has been inserted will be read one by one by the model and recur till the end [17]. The data process in LSTM through the forget gate, input gate, cell state, and output gate in th figure 2 [18].

Figure 2. LSTM Cell

(4)

The explanation for each gate can be described as follows.

1. Forget Gate

In the forget gate (𝑓𝑡) process input value form as previous output before pass the sigmoid function which is used for transformation the value between -1 and 1 become 0 and 1. This gate determines does the previous information will be forgot, because in the sigmoid function this number will be multiplied by 0 with exact result as 0 so that the value will be omitted [19].

𝑓𝑡 = 𝜎 (𝑊𝑓 . [𝑥𝑡 + ℎ𝑡 − 1] + 𝑏𝑓 ) (1) 2. Input Gate

Input gate (𝑖𝑡) process will be mix the previous output value with input value, then it will pass the two functions of activation. The first one pass the sigmoid activation function from the input value, and the other one is passing than activation function from candidate memory cell value (𝐶𝑡~). This process will select and determine the information will update also add the new vector candidate for the next gate [20].

𝑖𝑡 = 𝜎 (𝑊𝑖 . [𝑥𝑡 + ℎ𝑡 − 1] + 𝑏𝑖) (2) 𝐶𝑡~ = 𝑡𝑎𝑛ℎ(𝑊𝐶. [𝑥𝑡 + ℎ𝑡 − 1] + 𝑏𝐶) (3) 3. Cell State

Cell state (𝐶𝑡) is the first value which is produce from forget gate multiplied by cell state value and second value came from the input gate value multiplied by with candidate memory cell value [20].

𝐶𝑡 = 𝑓𝑡 × 𝐶𝑡 − 1 + 𝑖𝑡 × 𝐶^~𝑡 (4) 4. Output Gate

Output gate (𝑜𝑡) this is an output that is generated from the combined value of the previous value and the current value that has passed the sigmoid. After that enter the hidden layer (ℎ𝑡) with the result of the execution from the previous value, it become output sigmoid and tanh which is multiplied [20].

𝑜𝑡 = 𝜎(𝑊𝑜 × [𝑥𝑡 + ℎ𝑡 − 1] + 𝑏𝑜) (5) ℎ𝑡 = 𝑜𝑡 × 𝑡𝑎𝑛ℎ(𝐶𝑡) (6) 2.6 Evaluation

This evaluation use confusion matrix contents of the result of model in form of table that having four measurement values on the table 3. The label on table 3 consist of comparison from original data with prediction data. From table parameter there is TP (True positive) include the predict which is valuable positive and correctly according to the target. TN (True negative) include predict which is valuable positive and incorrectly incompatible to the target. FN (False negative) include predict which valuable negative and incorrectly incompatible to the target [9], [14].

Table 3. Confusion Matrix

True Values Positive Negative

Prediction Positive TP FP

Negative FN TN

After knowing the confusion matrix, we can see the accuracy value, precision, recall, and f1-score by calculating the equations 7 up to 10. Accuracy value is the calculating of few classifications built with target.

Precision is the accuracy calculating target data with result of the prediction from the model. Recall means the succeeding calculation model in rediscovering the information. F1-score is the calculation of comparison between precision and recall [14].

𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = ^{𝑇𝑃 + 𝑇𝑁}

𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁× 100% (7) 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = ^𝑇𝑃

𝑇𝑃 + 𝐹𝑃× 100% (8) 𝑅𝑒𝑐𝑎𝑙𝑙 = ^𝑇𝑃

𝑇𝑃 + 𝐹𝑁× 100% (9) 𝐹1 − 𝑆𝑐𝑜𝑟𝑒 = 2 × 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝑥 𝑅𝑒𝑐𝑎𝑙𝑙

𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙× 100% (10)

3. RESULT AND DISCUSSION

This part will describe the result from the research of detecting depression by LSTM method testing from 159 usernames. From each username will take 100 tweets to be dataset. This dataset will be divided into some

(5)

composition, 94 within depression labels and the 65 others by normal labels. From the amount of 2 label composition, it shows the data used is not balance.

This research through preprocessing step, feature extraction, and LSTM classification. The result from preprocessing step is used for clean the data from noise on table 4 with example 3 users. The function of preprocessing step is to clean the data from word which undefined the meaning and symbol unreadable by the model. For instance emoticon, mention, and the other unnecessary things.

Table 4. The Data Sample

User Tweet Preprocessing

User1 @markleebase https://t.co/UpfGU0ZQc8

@tanyakanrl alasannya demi bertahan hidup.

udah mana yang putusin dokter

alas tahan hidup putus dokter User2 Fagh tikhtokh Ai laik nurul budi ALLUKA C

OWOK ANJIM OEMJE wakwau Pengen men antea

sial tikhtokh laik nurul budi alluka cowok anjing

menantea User3 Kenapa fyp jadi makanan semua si? Fase

bingung akan segala hal -thv https://t.co/jtxGEHPB1X

fyp makan fase bingung thv

Preprocessing data is done, next step is extraction feature of Word2Vec and LSTM classification. Using Word2Vec will simplify LSTM read the data which normally deep learning could not read the data in the text form. This is the following figure 3 shows that one of the examples of vector representation from the word ‘sedih’

with 50 dimensions. From this vector it can be used to execute the LSTM classification. The words that have been converted into vectors are collected into a place called vector space [9].

Figure 3. Vector from the word dimension size 50

The classification LSTM will focus to the accuracy value and tuning parameter which is performed by LSTM model. Tuning parameter is a method to possess the best parameter in order to gain the best result so that knowing the side effect toward accuracy value that shown on table 5 [21]. LSTM unit means first parameter as the unit from memory. Epoch is the few of LSTM is practiced, batch size consists of the amount of data sampling usage, and Dense.

Table 5. LSTM Parameter

Unit LSTM Epoch Batch Size Dropout Dense

100 3 8 0 128, 1

In this research, several experiments were conducted, first experiment is done by testing the large feature on Word2Vec. This feature will be used as big as 50 and 300. The second experiment will compare the large split data used for data train and test. The first Spit data is about 80%:20% with data train at the rate of 127 data and the data test about 32 data. The second split data use the comparison 60%:40% with data train at the rate of 95 data and data test about 64 data. The feature value and data split were chosen to determine the difference in the results of the model classification. The comparison results with spit data shown in table 6.

Table 6. The Accuracy

Feature Comparison Precision Recall F1-Score Training Accuracy Testing Accuracy

50 60%:40% 51.56% 100% 68.03% 64.21% 51.56%

80%:20% 60% 17.65% 27.27% 75.59% 50%

300 60%:40% 51.56% 100% 68.03% 67.37% 51.56%

(6)

Feature Comparison Precision Recall F1-Score Training Accuracy Testing Accuracy

80%:20% 55.56% 58.82% 57.14% 77.95% 53.12%

Based on the test results, the best accuracy results are using the 300 features with 80:20 data split. This is because, the longer the vector and the more data that is trained, the better the learning model in processing data.

The results also show the test result which has been done shows that the test occur overfitting caused of the data usage have no variety since only use the data less than 200. It can be seen in the results of the preprocessing stage in table 3 if there are words that have not been processed properly so that the processed words do not match the meaning. In table 7 shows the succeed testing in detecting the depression on tweet by using experiments with the best training accuracy. This succeed test by comparing the Dass-42 label in the first labelling compare with prediction result from built model by taking a sample of 3 users according to the sample data in table 7.

Table 7. Username Prediction Username Label Prediction Result

User 1 Depresi 1 (depression) User 2 Depresi 1 (depression)

User 3 Normal 0 (normal)

4. CONCLUSION

Based on the detecting depression test used Word2Vec and LSTM method with total data about 159 Twitter users produce the good evaluation. The best result from detecting depression using Word2Vec by feature as big as 300 and LSTM. Split data use the comparison 80%:20% with result the accuracy is 77.95% and the F1-Score is 57.14%.

It can conclude that LSTM is able to process the data in less amount while perform detecting depression. So that this test can be used for company that need to detect depression toward the new employee in addition for this research better, we give advice to try variety data, especially when it uses questioner. Making a dealing for the researcher with respondent no need to private their account if the respondent agree if their data used for this research, in addition no need to change twitter username in order to collect the data easily. At the preprocessing stage, it needs to be improved so that the words used are better. Tuning parameter can be adjusted by data used since it won’t always the same as the different data.

REFERENCES

[1] A. A. Rachmawati, “Darurat Kesehatan Mental bagi Remaja,” Egsa Ugm, 2020.

[2] “Situasi Kesehatan Jiwa di Indonesia,” Kementrian Kesehatan Republik Indonesia, 2019.

https://pusdatin.kemkes.go.id/article/view/20031100001/situasi-kesehatan-jiwa-di-indonesia.html (accessed Oct. 16, 2021).

[3] G. S. Hafstad, S. S. Sætren, T. Wentzel-Larsen, and E. M. Augusti, “Adolescents’ symptoms of anxiety and depression before and during the Covid-19 outbreak – A prospective population-based study of teenagers in Norway,” Lancet Reg.

Heal. - Eur., vol. 5, 2021, doi: 10.1016/j.lanepe.2021.100093.

[4] A. A. Al Aziz, “Hubungan Antara Intensitas Penggunaan Media Sosial dan Tingkat Depresi pada Mahasiswa,” Acta Psychol., vol. 2, no. 2, 2020, doi: 10.21831/ap.v2i2.35100.

[5] M. L. Joshi and N. Kanoongo, “Depression detection using emotional artificial intelligence and machine learning: A closer review,” Mater. Today Proc., vol. 58, 2022, doi: 10.1016/j.matpr.2022.01.467.

[6] H. F. Putro, R. T. Vulandari, and W. L. Y. Saptomo, “Penerapan Metode Naive Bayes Untuk Klasifikasi Pelanggan,” J.

Teknol. Inf. dan Komun., vol. 8, no. 2, 2020, doi: 10.30646/tikomsin.v8i2.500.

[7] F. Sodik, B. Dwi, and I. Kharisudin, “Perbandingan Metode Klasifikasi Supervised Learning pada Data Bank Customers Menggunakan Python,” J. Mat., vol. 3, 2020.

[8] W. Hastomo and A. Satyo, “Long Short Term Memory Machine Learning Untuk Memprediksi Akurasi Nilai Tukar IDR Terhadap USD,” Pros. SeNTIK, vol. 3, no. 1, pp. 115–124, 2019.

[9] W. Widayat, “Analisis Sentimen Movie Review menggunakan Word2Vec dan metode LSTM Deep Learning,” J. MEDIA Inform. BUDIDARMA, vol. 5, no. 3, 2021, doi: 10.30865/mib.v5i3.3111.

[10] S. Kusumadewi and H. Wahyuningsih, “Model Sistem Pendukung Keputusan Kelompok untuk Penilaian Gangguan Depresii, Kecemasan dan Stress Berdasarkan DASS-42,” J. Teknol. Inf. dan Ilmu Komput., vol. 7, no. 2, 2020, doi:

10.25126/jtiik.2020721052.

[11] H. A. Maulana, “Psychological Impact of Online Learning during the COVID-19 Pandemic: A Case Study on Vocational Higher Education,” Indones. J. Learn. Educ. Couns., vol. 3, no. 2, 2021, doi: 10.31960/ijolec.v3i2.833.

[12] A. Pragota, “Berkenalan dengan Twint.” https://learn.nural.id/course/data-science/twitter-scrap/berkenalan-dengan-twint (accessed Jun. 28, 2022).

[13] Y. R. Sipayung and R. Sulistyowati, “Identifikasi Komentar Negatif Berbahasa Indonesia Pada Instagram Dengan Metode K-Means,” Multimatrix, vol. 2, no. 1, 2020.

[14] R. Indira and W. Maharani, “Personality Detection on Social Media Twitter Using Long Short-Term Memory with Word2Vec,” 2021. doi: 10.1109/COMNETSAT53002.2021.9530820.

[15] M. Rusli, M. R. Faisal, and I. Budiman, “Ekstraksi Fitur Menggunakan Model Word2Vec Untuk Analisis Sentimen Pada Komentar Facebook,” Semin. Nas. Ilmu Komput., vol. 2, no. January 2019, 2019.

(7)

[16] A. Nurdin, B. Anggo Seno Aji, A. Bustamin, and Z. Abidin, “PERBANDINGAN KINERJA WORD EMBEDDING WORD2VEC, GLOVE, DAN FASTTEXT PADA KLASIFIKASI TEKS,” J. Tekno Kompak, vol. 14, no. 2, 2020, doi:

10.33365/jtk.v14i2.732.

[17] B. Lindemann, T. Müller, H. Vietz, N. Jazdi, and M. Weyrich, “A survey on long short-term memory networks for time series prediction,” in Procedia CIRP, 2021, vol. 99. doi: 10.1016/j.procir.2021.03.088.

[18] S. Dobilas, “LSTM Recurrent Neural Networks — How to Teach a Network to Remember the Past,” Medium.

https://towardsdatascience.com/lstm-recurrent-neural-networks-how-to-teach-a-network-to-remember-the-past- 55e54c2ff22e (accessed Jun. 28, 2022).

[19] A. C. Sitepu and M. Sigiro, “Analisis Fungsi Aktivasi Relu dan Sigmoid menggunakan optimizer SGD dengan Representasi MSE pada Model Backpropogation,” Pros. SeNTIK, vol. 1, 2021.

[20] S. Ghimire, Z. M. Yaseen, A. A. Farooque, R. C. Deo, J. Zhang, and X. Tao, “Streamflow prediction using an integrated methodology based on convolutional neural network and long short-term memory networks,” Sci. Rep., vol. 11, no. 1, 2021, doi: 10.1038/s41598-021-96751-4.

[21] A. E. MINARNO, M. H. C. MANDIRI, and M. R. ALFARIZY, “Klasifikasi COVID-19 menggunakan Filter Gabor dan CNN dengan Hyperparameter Tuning,” ELKOMIKA J. Tek. Energi Elektr. Tek. Telekomun. Tek. Elektron., vol. 9, no. 3, 2021, doi: 10.26760/elkomika.v9i3.493.