YouTube Viewership Increation Analysis and Prediction using Facebook Prophet Model

(1)

YouTube Viewership Increation Analysis and Prediction using Facebook Prophet Model

Rezqie Hardi Pratama^*, Putu Harry Gunawan

School of Informatics, Informatics, Telkom University, Bandung, Indonesia Email: ^1,*[email protected], ²[email protected],

Correspondence Author Email: [email protected]

Abstract−YouTube, a widely accessed video-sharing platform available through both mobile applications and web interfaces, serves as a medium for content creators, commonly referred to as YouTubers, to engage with their audience. The success of a YouTuber is intricately tied to their audience engagement, encompassing metrics such as total views, comments, and likes garnered by their videos. This study involves the analysis of 7,600 English-language videos uploaded on YouTube between August and September 2020. To assess the predictive success value of a video, the study employs the Facebook Prophet method.

Focusing on the upload time as a primary parameter, this method forecasts the growth in the number of YouTube viewers using datasets obtained from the YouTube API. Leveraging Time Series modeling, Facebook Prophet processes data by considering audience interactions throughout a video broadcast. The results derived from the Facebook Prophet model indicate a predictive trend of increasing viewership on YouTube in the coming months. The evaluation of model linearity, measured using the R² score to gauge data reliability, reveals a score of 0.39 or 39% which indicates a positive linearity score. And using Pearson correlation it gives 75 accuracy score. This signifies the model's capability to reasonably predict the growth in the number of viewers, contributing valuable insights into the dynamics of YouTube audience engagement over time.

Keywords: Time Series; Youtube; Facebook Prophet; Prediction; R² Score; Pearson

1. INTRODUCTION

YouTube is a video-sharing website that facilitates its users to upload videos and then share them. Videos that have been uploaded can be watched by YouTube viewer [1]. YouTube viewers have the opportunity to create their personal accounts, enabling them to contribute and share their own content on the platform. Content creators on YouTube generate income through viewer engagement with their uploaded videos. Over time, there's an expected growth in the viewership of YouTube videos. Utilizing the YouTube API, data was crawled from English-language video sources uploaded between August 11 and September 12, 2020, amounting to 7600 data entries.

In 2021, Dwin Indrawan et al. conducted a study titled 'Predicting YouTube Video Increase Using the Deep Neural Network (DNN) Model.' The research outcomes revealed significant differences in accuracy values among various models. The Linear Regression model yielded an R² score of 3% (0.03), considerably lower than the Naïve Bayes method, which resulted in an R² score of -8% (-0.08). This divergence implies opposing trend predictions.

On the other hand, when compared to the Artificial Neural Network (ANN) model utilizing the DNN as a benchmark, the best performance was observed with the Adam optimizer, showcasing an impressive R² score of 92% (0.92) [2].

Furthermore, in the same year Kong Yih Hern from Malaysia embarked on a comprehensive exploration delving into the intricacies of Facebook user behavior. Employing the innovative Facebook Prophet method and Long Short-Term Memory (LSTM), the study meticulously analyzed three distinct datasets, each offering unique insights into user interactions. In the initial dataset, the Facebook Prophet model unveiled its prowess with an astounding R² score of 99.9%, boasting an RMSE value of 763,306, while LSTM, although commendable, achieved a slightly lower R² score of 95.8%, coupled with an RMSE value of 347,279. Transitioning to the subsequent dataset, the Facebook Prophet model continued to demonstrate its robust predictive capacity, maintaining a strikingly high R² score of 99.9% alongside an RMSE value of 5088,143. In contrast, LSTM delivered an R² score of 94.7% with a corresponding RMSE value of 2283,617. The exploration culminated in the third dataset, where the Facebook Prophet method sustained its remarkable performance, registering an impressive R² score of 99.5%, supported by an RMSE value of 328,844. However, it's noteworthy that LSTM encountered a substantial decline in predictive accuracy, securing an R² score of 58.5%, accompanied by an RMSE of 783,955.

[3].

Consistently across datasets, the Facebook Prophet model showcased higher R² scores compared to LSTM.

Although the Facebook Prophet's RMSE values indicated higher errors, intriguingly, under specific circumstances, its RMSE values occasionally dipped below those of LSTM without compromising the R² score, underscoring its nuanced predictive capabilities even amidst higher error margins. This comprehensive investigation illuminated the strengths and intricacies of both models in interpreting Facebook user behavior dynamics

The escalating count of YouTube video viewers significantly influences content creators due to its multifaceted impact. Aside from the revenue generated via advertisements on a content owner's video, the volume of views serves as a pivotal metric for content discoverability. As the viewership of a video increases, a proportional surge in revenue flows into the content creator's YouTube account. Consequently, predictive analytics play a vital role in aiding YouTubers by enabling them to forecast the trajectory of their video's viewership growth

(2)

rate. Such prognostications empower content creators to strategize effectively, projecting and managing the rapid expansion of their video audience. [4].

This Research will be conducted on analyzing and predicting the increase in the number of YouTube viewers who watch English videos in the time span from August to September 2020 using the Facebook Prophet Model to determine daily, monthly, and annual predictions. And to do the comparison with the DNN method above uses the calculation of RMSE, R² score, and also Pearsonr correlation to determine the accuracy value of the score obtained using this model. RMSE is used to find the score based on its error metrics value while R² is to get linearity in a range between 0-100% by measuring the relationship between the dependent variable and the regression model. Pearson correlation is used to evaluate linear relationships to get correlation metrics between those two variables, it can be either positive or negative but it also can give you a no correlation score based on a scale between -1 to 1.

2. RESEARCH METHODOLOGY

2.1 System Design Flow

In this section, the following stages will be displayed on Figure 1 in the form of a flowchart which is the flow of the model that will be used to conduct research experiments to analyze and predict the increase in the number of YouTube viewers using the Facebook Prophet model. The first step is to prepare the dataset that preprocesses to maximize the data analysis later. After the data has gone through preprocessing, it will split into data tests and data trains. The data train is used to train the data to fit into the fbprophet algorithm meanwhile the data test is used to get the prediction value for forecasting. After splitting the data, the next step is to apply it to the fbprophet model to get the analysis and prediction value. The last step is to count the performance analysis value based on the performance metrics by fbprophet¸R² score or use Pearson correlation.

Figure 1. System Design Flowchart 2.2 YouTube

YouTube is a video-sharing website that facilitates its users to upload videos and then share them. Videos that have been uploaded can be watched by YouTube viewers [1]. YouTube was founded in 2005 and then purchased

(3)

by Google in 2006 and became the site with the third most visitors after google.com and facebook.com [5]. Based on statistics, YouTube has a daily user base of 122 million people with 500 hours of video uploaded every minute.

With various genres spread across YouTube, the audience varies from age to gender of its users. [6].

2.3 Facebook Prophet

Facebook Prophet is a Machine Learning modeling used to predict time-series data based on non-linear trends that can be daily, weekly, monthly, to yearly and added with trends on holiday [7]. Facebook Prophet itself is useful for predicting gains. Facebook Prophet provides a regression decomposition model that can be customized and is very easy to use for beginners [8]. This model also provides functions to perform cross-validation to calculate the performance error on the metrics, it also can add regression, and then set the seasonality according to what the user wants as well. In addition, Facebook Prophet can also detect trend changes automatically by selecting change points from the data. In general, Facebook Prophet uses three augmented functions as follows [9] :

y(t) = g(t) + s(t) + h(t) + e(t) (1)

Where y(t) is the output of the prediction model. g(t) is the linear increase of a trend within a predetermined period. Furthermore, s(t) refers to periodic changes in data in the form of weekly, monthly, and yearly. h(t) is a component for holidays or other important days. Finally, e(t) mean the error value of the data [10]. After preprocessing, the data will then be used for prediction using the Facebook Prophet model. The data needed to predict the increase in viewership is time data characterized by the publishedAt parameter and view_count. The publishedAt data indicates time so it will be converted into a "ds" data frame and the view_count data will become a "y" data frame as a reference for data prediction estimates [11]. Fbprophet has 2 different prediction models, there are logistic and linear ways to do this prediction, this prediction uses YouTube viewership data which means it has no restrictions [12]. This linear model also takes the overall trend of the data by marking specific points of historical data changes that occur.

2.4 Performance Analysis

In an algorithm, the accuracy of the data is required, therefore Performance Analysis is needed to calculate the accuracy of the data. In this method, the R² score will be used to calculate the level of linearity and accuracy, assisted by the validity of the Root-Mean Squared Error commonly known as RMSE to calculate the resulting error value. The first step used is cross-validation. Cross-validation is used to measure how good the model is based on the estimated time (horizon), the initial period of data (initial), and the distance between dates (period).

After doing cross-validation, the next step is to find performance metrics. to get the accuracy value of measuring the average amount of error. RMSE uses the following calculation [13] :

RMSE = √¹

n∑ⁿ_i=1(yi− ŷi)² (2)

This performance takes the data of the result of squaring (y_i-ŷ_i) error divided by the number of data n and then rooted. The lowest result indicates the best model [14]. In addition to RMSE, the R² score is also needed to calculate the linear value of the data by measuring the relationship between the dependent variable and the regression model on a scale of 0-100%. The R² score is obtained by calculating the total squared value and the squared residual value [15]. From the R² score formula, the total sum squared value is the actual value of the data (y_true) minus the average actual value (mean y_true) of the data and then squared, while the residual data squared value is the actual value of the data (y_true) minus the predicted value (y_pred) of the data and then squared. The following is the formula for the R² score [16]:

R²= 1 −^(y^true^{− y}^{̅̅̅̅̅̅̅̅)}^true²

(y_true−y_pred)² (3)

In an algorithm, the accuracy of the data is required, therefore Performance Analysis is needed to calculate the accuracy of the data. In this method, the R² score will be used to calculate the level of linearity and accuracy, assisted by the validity of the Root-Mean Squared Error commonly known as RMSE to calculate the resulting error value. The first step used is cross-validation. Cross-validation is used to measure how good the model is based on the estimated time (horizon), the initial period of data (initial), and the distance between dates (period).

Pearson correlation evaluates the linear relationship between two variables by assessing how close the relationship can be described by a straight line. Pearson correlation ranges from -1 to 1 which indicates a positive or negative correlation [17]. A negative value means that the model describes a negative linear relationship whereas the variable increases, the other variable will decrease. While a positive value describes a positive linear relationship when one variable increases the other variable will also increase. If the pearsonr correlation gives a value of 0 or close to 0, it tells us that the two variables have no relationship, and if one changes, then the other will not change too much [18]. Its formula is shown as follows:

ρ(x, y) = ∑ (^xⁱ^−x̅

ρx ni=1 )(^yⁱ^−y^̅

ρy ) (6)

(4)

ρ(x, y)represents the Pearson correlation score between x and y, n is the number of observations. xi and yi

are individual data values in the dataset. x̅ and y̅ are the mean of each variable x and y. Lastly is ρx adn ρy represent the standard deviations of x and y respectively

3. RESULT AND DISCUSSION

3.1 Data Preprocessing

The dataset used is data that has been collected through the YouTube API was extracted from the YouTube Trending Video Dataset available on Kaggle. The data spans from August 11 to September 12, 2020, encompassing several key parameters. The data parameters used are ‘publishedAt’ is the date and time of uploading the video,

‘categoryId’ is categorical data according to the video genre, ‘view_count’ is the number of views of a video,

‘likes’ is the number of users who like the video, ‘dislikes’ is the number of users who dislike the video, and finally

‘comment_count’ is the number of comments on a video. The following is the amount of view data obtained from a predetermined period which is shown on Figure 2:

Figure 2. Time Series Data Table 1. Data Sample

publishedAt categoryId view_count likes dislikes comment_count

2020-08-11, 19:20:14 22 1514614 156908 5855 35313

2020-08-13, 18:36:52 23 485072 25319 319 1232

2020-08-17,

10 1148924 67986 1466 4831

16:02:22 2020-08-22,

1 344458 26691 157 2776

22:16:53 2020-08-30,

24 6417155 861189 2870 20488

17:00:08 2020-09-09,

1 1983816 156190 4138 10898

16:15:11 2020-09-12.

28 431610 9879 410 1283

21:12:19

The data on Table 1 above shows the results of a series of stages that include collecting datasets through the data crawling process. The data that has been processed will then go through a data reduction and normalization process to ensure that the data presented has undergone optimal refinement and consistency. The dataset that have been obtained will be preprocessed by performing data reduction by eliminating data with null values and then normalizing the data.

Figure 3. Data Preprocessing Flowchart

(5)

On Figure 3 the data reduction stage removes all data that gives null values to the data and removes columns that will not be used in this research. This stage is also used to reduce data noise due to too much data value. Then, the data will be smooth by taking the average value of each hour. After the data is reduced, it will be normalized which will divided into test data and train data using the cutoff date method. Train data is taken from the initial date of the data until September 10, while test data is data taken after the date above. The Normalization stage aims to change the data using a numerical scale so that there is no data redundancy. Normalization has the following formula [19]:

𝑁 = ^𝑥−𝑥^𝑚𝑖𝑛

𝑥_𝑚𝑖𝑛−𝑥_𝑚𝑎𝑥 (5)

At this stage, 𝑥 is the previous data, 𝑥𝑚𝑖𝑛 is the minimum data while 𝑥𝑚𝑎𝑥 is the maximum data and 𝑁 is the normalization result in a certain range.

Figure 4. Time Series Reduction Data by hourly date

The figure 4 above shows the datasets which has value of data that has undergone a reduction process by limiting the amount of view_count data to 1250000 and then aggregating the value of data every hour into the average value of the value in that hour only. The reduced data will go through a normalization process to make the data more structured using sklearn in the range (-1,1). The data will also go through a denoising process to improve data quality with the data smoothing method and window_size of 20, this method reduces noise in the data. All methods are carried out to increase the level of data accuracy. This following graph on figure 5 shows the result of the smoothed data :

Figure 5. Smoothen reduction noise data 3.2 Facebook Prophet Model

(a). Scenario 1 (b). Scenario 2

(6)

(c). Scenario 3 (d.) Scenario 4 Figure 6. Forecasting Model Facebook Prophet

Figure 6 above shows the amount of data obtained from a predetermined period and will take predictions for a week. From the first scenario graph, it can be seen that the trend has decreased but finally increased at the end of August. The red dotted line also shows the rapid change in data that occurred on that day. Scenario 2 represent the second scenario still uses a view_count restriction of 1250000, it's just that after data reduction, the data does not go through the denoise stage making the data still have a lot of noise, so the second scenario graph above shows that the data is still widespread and not centralized. this also makes the trend graph not look too significant and the graph also does not show a red dotted line indicating no change in data. In the third and forth Scenario, which shows scenario 3 and 4 respectively, the data is not reduced, which means there is no view_count limit so the data processed also becomes more and can be seen in the graph below

Figure 7. Time Series Hourly Data without reduction

Figure 8. Smoothen unreduced noise data

Figure 7 and Figure 8 shows that while the third scenario is still experiencing data denoising, the fourth scenario does not experience any reduction or normalization at all so the fourth scenario becomes the least centralized data compared to the others. and although the third scenario has gone through the denoising stage the graph obtained from the table above shows the opposite of the first graph, namely, the trend changes that occur tend to decrease while the fourth scenario is the same as the second scenario which does not show any significant changes.

(7)

3.3 Performance Result

Table 2. Performance metrics scenario 1

Horizon MSE RMSE MAE MAPE MdAPE SMAPE Coverage 1 day 15:00 0.020244 0.142281 0.119072 0.330385 0.292151 0.392360 0.694118 1 day 16:00 0.020525 0.143265 0.119205 0.332730 0.292151 0.395983 0.694118 1 day 17:00 0.021027 0.145007 0.120691 0.336084 0.298154 0.399739 0.690196 1 day 18:00 0.020693 0.143850 0.120282 0.333943 0.292151 0.397249 0.701961 1 day 19:00 0.020388 0.142786 0.118190 0.324343 0.277098 0.389477 0.717647

Horizon MSE RMSE MAE MAPE MdAPE SMAPE Coverage 1 day 16:00 0.050583 0.224906 0.183395 1.127.106 0.337644 0.462833 0.741176 1 day 17:00 0.048417 0.220038 0.177666 1.035.503 0.330378 0.445279 0.754118 1 day 18:00 0.047746 0.218509 0.176237 0.811969 0.330378 0.439446 0.756863 1 day 19:00 0.044572 0.211121 0.169143 0.798857 0.310334 0.423153 0.775294 1 day 20:00 0.044075 0.209940 0.168577 0.817724 0.305768 0.424456 0.776471

Horizon MSE RMSE MAE MAPE MdAPE SMAPE Coverage 1 day 16:00 0.139130 0.373002 0.304562 1.288.213 0.756507 0.619155 0.336735 1 day 17:00 0.141772 0.376526 0.306805 1.293.603 0.756507 0.620737 0.346939 1 day 18:00 0.145892 0.381957 0.312558 1.309.138 0.756507 0.628739 0.336735 1 day 19:00 0.150590 0.388059 0.321420 1.316.032 0.765840 0.641072 0.306122 1 day 20:00 0.155631 0.394501 0.329815 1.339.323 0.871208 0.654396 0.285714

Horizon MSE RMSE MAE MAPE MdAPE SMAPE Coverage 1 day 16:00 0.027167 0.164825 0.093392 6.783.520 1.044.950 0.973658 0.891582 1 day 17:00 0.027433 0.165630 0.093440 6.577.320 0.971183 0.953367 0.887755 1 day 18:00 0.025885 0.160888 0.091775 4.970.091 0.965450 0.942566 0.889668 1 day 19:00 0.023511 0.153334 0.089999 4.925.598 0.965450 0.936367 0.887755 1 day 20:00 0.025172 0.158656 0.093274 4.938.989 0.957414 0.938048 0.882653 From table 2 to table 5 above shows the cross-validation that has been done through the library of fbprophet by finding the estimated time (horizon), the initial period of data (initial), and the distance between dates (period).

Choosing a prediction model must also be done carefully, because the model has a significant difference and will also affect the entire prediction process because different periods can get different outputs, as well as the initial data period and the estimated time as well [20]. The initial period starts at the first 15 days of the monthly data that has been collected before, and then the data has a 1-day distance between dates to get accurate data per day.

The estimated time used is 15 days, so every 15 days the beginning of the model will produce a forecast value for the next 15 days. Cross-validation fbprophet can also find out directly the value of Mean Squared Error (MSE), Root-Mean Squared Error (RMSE), Mean Average Error, Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), Median Absolute Percentage Error (MdAPE), Symmetric Mean Absolute Percentage Error (SMAPE) and Coverage value. By calculating the y_true and y_pred values which are the actual and predicted values of the data, the RMSE value can be obtained. However, what will be taken for this test is only the RMSE value. After being formulated, the RMSE value obtained for the first scenario is 0.234498, while the second is 0.231753, the third one is 0.184498, and the last RMSE value obtained has 0.0881811 score, and based on that value it makes the fourth scenario has the best RMSE value. The following is a graph obtained from each RMSE performance scenario :

(8)

(c). Scenario 3 (d). Scenario 4 Figure 9. Scenario RMSE Graph

After performing the RMSE performance, the R² score can be processed by finding the total squared value and the squared residual value. Figure 9. above shows RMSE score on the first scenario. After the two values are obtained, then the R² value can be obtained. Because there are 4 different scenarios, it has 4 different R² score as well. Of the four scenarios, it is found that the first scenario that has gone through the reduction and normalization stages has the highest R² value with 0.39 or 39% in contrast to the others. The second scenario only gets a value of 0.03 or 3%. The third scenario has the worst score of all scenarios, even though it has gone through denoise the R² value reaches -549% which means this value is very non-linear, while the fourth scenario, which does not experience data reduction or denoise has a higher R² score than the third scenario even though the fourth scenario's R² score is still considered bad because it has a value of -0.92 or -92%.

Based on Pearson correlation formulas it is shown that to find the correlation y_true and y_pred are needed as x and y values on that formula. After the variable is compiled into the formula, it will give a result based on the correlation between those variables. The first scenario shows that it has a 0.75684 score which gives a positive linear correlation. The second scenario gives a 0.19958 score, it also gives a positive linear correlation just not as good as the first one However, in the third scenario, a score of -0.15138 signifies a negative linear correlation.

Lastly, the fourth scenario displays an almost negligible correlation, recording a score of 0.03174 on the Pearson correlation scale.

(c). Scenario 3 (d). Scenario 4

Figure 10. Actual and Prediction data

Figure 10 presents a comparison between the actual data and the predicted data for each scenario. In the first scenario, there exists a slight resemblance between the two sets of data, resulting in a positive linear correlation. Scenario 2 demonstrates no resemblance whatsoever, while scenario 3 exhibits an exact inverse relationship between the variable data. In scenario 4, there is a notable positive resemblance, yet unfortunately, despite this, the Pearson correlation score does not indicate any correlation between the variables.

(9)

4. CONCLUSION

Upon conducting the Root Mean Squared Error (RMSE) analysis, the determination of the R² score involves computing the total squared value and squared residual value. These metrics offer insights into the relationship between the model's predictions and the actual data points, along with the Pearson correlation. Among the four distinct scenarios evaluated, it became evident that the initial scenario, having undergone reduction and normalization stages, exhibited the most promising performance. Notably, this scenario demonstrated the highest R² value, standing at 0.39 or 39%, outperforming the remaining scenarios comprehensively. In contrast, the second scenario yielded a notably lower R² value of 0.03 (3%), accompanied by a Pearson correlation of 19%. However, the third scenario presented the most concerning results among all scenarios. Despite undergoing denoising processes, its R² value plummeted to -549%, signifying an extremely non-linear relationship within the data.

Surprisingly, the fourth scenario, which lacked data reduction or denoising, managed to outscore the third scenario in terms of R², albeit still displaying a considerably negative value of -0.92 or -92%. This outcome raises questions regarding its efficacy despite a relatively improved performance in RMSE. Interestingly, while the fourth scenario excelled in RMSE, its R² and Pearson correlation scores lagged behind. Notably, the Pearson correlation in the first scenario stood out positively at 75% or 0.75, emphasizing a strong linear relationship within that particular setting. In light of these comprehensive analyses, it becomes evident that despite the fourth scenario's superior RMSE, the first scenario emerges as the most optimal. Its substantial R² value, coupled with a robust Pearson correlation, solidifies its superiority among the scenarios considered. Therefore, the first scenario emerges as the most viable and reliable choice for the model's predictive capabilities.

REFERENCES

[1] J. Arthurs, S. Drakopoulou, and A. Gandini, “Researching YouTube,” Convergence, vol. 24, no. 1, pp. 3–15, 2018, doi:

10.1177/1354856517737222.

[2] D. Indrawan, S. R. Cakrawijaya, B. D. Wicaksono, E. Erni, and W. Gata, “Prediksi Jumlah Penonton Video Youtube Menggunakan Model Deep Neural Network (Dnn),” J. Inf. Syst. Informatics Comput., vol. 5, no. 1, p. 94, 2021, doi:

10.52362/jisicom.v5i1.463.

[3] K. Y. Hern, L. K. Yin, and C. W. Yoke, “Forecasting Facebook User Engagement Using Hybrid Prophet and Long Short- Term Memory Model.” International Conference on Digital Transformation and Applications (ICDXA), Penang, Malaysia, 2021.

[4] M. S. Irshad, A. Anand, and M. Agarwal, “Modeling Active Life Span of YouTube Videos Based on Changing Viewership-Rate.” Revista Investigation Operacional, 2020.

[5] S. Yang, D. Brossard, D. A. Scheufele, and M. A. Xenos, “The science of YouTube: What factors influence user engagement with online science videos?,” PLoS One, vol. 17, no. 5 May, pp. 1–19, 2022, doi:

10.1371/journal.pone.0267697.

[6] K. Yousaf and T. Nawaz, “A Deep Learning-Based Approach for Inappropriate Content Detection and Classification of YouTube Videos,” IEEE. 2022.

[7] E. Žunić, K. Korjenić, K. Hodžić, and D. Đonko, “Application of Facebook’s Prophet Algorithm for Successful Sales Forecasting Based on Real-world Data,” Int. J. Comput. Sci. Inf. Technol., vol. 12, no. 2, pp. 23–36, 2020, doi:

10.5121/ijcsit.2020.12203.

[8] C. Materials, M. Khayyat, and K. Laabidi, “Time Series Facebook Prophet Model and Python for COVID-19 Outbreak Prediction Time Series Facebook Prophet Model and Python for COVID-19 Outbreak Prediction,” no. March, 2021, doi:

10.32604/cmc.2021.014918.

[9] F. T. B. Sitepu, V. A. P. Sirait, and R. Yunis, “Analisis Runtun Waktu Untuk Memprediksi Jumlah Mahasiswa Baru Dengan Model Prophet Facebook.” Paradigma, 2021. doi: 10.31294/p.v23il.9756.

[10] M. M. Hossain, N. Garg, A. H. . F. Anwar, M. Prakash, and M. Bari, “Monthly Rainfall Prediction for Decadal Timescale using Facebook Prophet at Catchment Level.” researchgate, 2021.

[11] C. B. Aditya Satrio, W. Darmawan, B. U. Nadia, and N. Hanafiah, “Time series analysis and forecasting of coronavirus disease in Indonesia using ARIMA model and PROPHET,” Procedia Comput. Sci., vol. 179, no. 2020, pp. 524–532, 2021, doi: 10.1016/j.procs.2021.01.036.

[12] M. A. Haq, “CDLSTM: A Novel Model for Climate Change Forecasting,” Tech Sci. Press, 2021.

[13] T. O. Hodson, “Root-mean-square error (RMSE) or mean absolute error (MAE): when to use them or not.” Geoscientific Model Development, 2022. doi: https://doi.org/10.5194/gmd-15-5481-2022.

[14] H. N. Yashinta, A. Aklis, and F. B. Sari, “Akurasi Analisis Time Series Dengan Metode Rmse Pada Forecasting Harga Saham Bank Syariah Yang Ada Di BEI,” Al-Muhtarifin Islam. Bank. …, vol. 1, no. 1, pp. 65–73, 2022, [Online].

Available: http://jurnal.umsu.ac.id/index.php/ALMUHTARIFIN/article/view/8980

[15] P. Calista, P. Yones, and S. Muthaiyah, “Asia Paci fi c Management Review eWOM via the TikTok application and its in fl uence on the purchase intention of somethinc products,” Asia Pacific Manag. Rev., vol. 28, no. 2, pp. 174–184, 2023, doi: 10.1016/j.apmrv.2022.07.007.

[16] F. Rustam et al., “COVID-19 Future Forecasting Using Supervised Machine Learning Models,” IEEE Access, vol. 8, pp.

101489–101499, 2020, doi: 10.1109/ACCESS.2020.2997311.

[17] E. Van Den Heuvel and Z. Zhan, “Myths About Linear and Monotonic Associations : Pearson ’ s r , Spearman ’ s ρ , and Kendall ’ s τ,” Am. Stat., vol. 0, no. 0, pp. 1–19, 2022, doi: 10.1080/00031305.2021.2004922.

[18] Pawan and R. Dhiman, “Electroencephalogram channel selection based on pearson correlation coefficient for motor imagery-brain-computer interface,” Meas. Sensors, vol. 25, no. November 2022, p. 100616, 2023, doi:

10.1016/j.measen.2022.100616.

(10)

[19] M. M. Muzakki and F. Nhita, “The spreading prediction of Dengue Hemorrhagic Fever (DHF) in Bandung regency using K-means clustering and support vector machine algorithm,” 2018 6th Int. Conf. Inf. Commun. Technol. ICoICT 2018, vol. 0, no. c, pp. 453–458, 2018, doi: 10.1109/ICoICT.2018.8528782.

[20] A. Alsharef, K. Aggarwal, Sonia, M. Kumar, and A. Mishra, “Review of ML and AutoML Solutions to Forecast Time- Series Data,” Arch. Comput. Methods Eng., vol. 29, no. 7, pp. 5297–5311, Nov. 2022, doi: 10.1007/s11831-022-09765- 0.