• Tidak ada hasil yang ditemukan

Prediction of Bandung City Traffic Classification Using Machine Learning and Spatial Analysis

N/A
N/A
Protected

Academic year: 2023

Membagikan "Prediction of Bandung City Traffic Classification Using Machine Learning and Spatial Analysis"

Copied!
10
0
0

Teks penuh

(1)

Prediction of Bandung City Traffic Classification Using Machine Learning and Spatial Analysis

Adhitya Aldira Hardy*, Aniq Atiqi Rohmawati, Sri Suryani Prasetyowati Informatics, Telkom University, Bandung, Indonesia

Email: 1,*adhityahardy@student.telkomuniversity.ac.id, 2aniqatiqi@telkomuniversity.ac.id,

3srisuryani@telkomuniversity.ac.id

Correspondent Author Email: adhityahardy@student.telkomuniversity.ac.id

Abstract−This research proposes a visualization of Bandung City congestion map classification using machine learning and kriging interpolation methods. The machine learning methods used are Naive Bayes and Artificial Neural Network (ANN) for the congestion classification process. The kriging interpolation used is simple kriging to create a spatial location map visualization on the congestion classification prediction. They are based on the classification results of both methods. Naïve Bayes is ideal supervised learning for classification, while ANN is ideal unsupervised learning for prediction. The classification was performed on arterial and collector roads with 11 intersections that are congestion points. The data used is traffic counting data for Bandung City in April 2022. The congestion classification is divided into four categories based on the congestion level.

This category division causes data imbalance, so the Random Oversampling technique is used to overcome data imbalance.

The result is that the ANN method has better performance, with an accuracy rate of 93% and an RMSE value of 0.9746, while the Naïve Bayes method has an accuracy rate of 90% and an RMSE value of 0.9381. The resulting classification map shows that in April 2022, the southern area of Bandung City experienced the highest congestion compared to the northern, western and southern areas. This research provides the best algorithm between the two methods. It provides information on congestion in Bandung City by visualizing the congestion classification map to reduce traffic congestion in the city of Bandung.

Keywords: Classification; Traffic Congestion; Artificial Neural Network; Naïve Bayes; Simple Kriging

1. INTRODUCTION

Congestion is one of the traffic problems in developing countries like Indonesia and often occurs in densely populated cities. Bandung City is one city with a dense population and frequent congestion. Bandung is the capital city of West Java Province which has many activities in terms of business, economy, government, and so on. The high mobility of the people of Bandung causes many motorists to crowd the traffic. Traffic congestion in Bandung often occurs in the morning when community activities have just begun, with workers going to work and students leaving for school. Traffic jams that occur again when people return from their activities in the afternoon often cause other traffic problems[1].

The negative impact of congestion can be alleviated using machine learning. Machine learning is the science of making intelligent machines. An intelligent machine can be defined as a machine that is reliable in doing something using human-like intelligence[2]. Machine learning algorithms can help traffic analysts solve congestion problems. Over the past few years, researchers have emphasized traffic forecasting with extensive research on intelligent transportation systems. As a result, traffic prediction has developed into one of the major research topics in traffic engineering[3]. The utilization of machine learning is implemented by knowing the patterns generated from the calculation of intelligent calculation algorithms[4]. Scientists from various disciplines analyze using time series methods to predict traffic congestion. The technique effectively indicates traffic congestion and can create a fast and effective prediction model[5].

Many studies have done congestion classification and prediction but have not done map visualization.

Research [6], [7], [8] still discusses performance analysis on classification models to predict traffic using Artificial Neural Network (ANN). Research [6] performs congestion classification based on time series data in Dublin City.

Testing the ANN method performs feed-forward and backpropagation. Then do training data and get 98% accuracy results [6]. Research [7] conducted a classification of road accidents in Abu Dhabi. This research categorizes into four levels of severity due to accidents. The use of hierarchical clustering is used to divide the data into six different shapes and numbers of clusters. Then conduct testing by splitting the training data as much as 66% and test data as much as 34% of the total data. The results obtained are the ANN method produces an accuracy of up to 84.1%

for training data and makes predictions with an accuracy rate of 64.5% [7]. Research [8] conducted congestion classification based on time series data in the City of Los Angeles. This research applies the SMOTE technique to prevent overfitting. Then conduct testing by dividing the training data by as much as 70% and the test data by as much as 30%. Then the data training is carried out and gets 79.4% accuracy results[8].

Classification and prediction using the Naïve Bayes method are also widely done and have not also done map visualization as in research [9], [10], [11]. Research [9] Classify congestion in Jakarta using Twitter data. The level of congestion in this study is divided into four, namely "Jammed," "Dense," "Dense Crawling," and "Stalled."

The highest accuracy results obtained reached 68.66%[9]. Research [10] conducted accident classification in the United Kingdom using big data. This research compares the performance of Naïve Bayes, C4.5, Random Forest, and AdaboostM1 classification algorithms. Data cleaning is carried out at the feature selection stage to achieve high accuracy and efficiency in computing time. The results obtained are the Naïve Bayes algorithm gets the most

(2)

optimal results with an accuracy rate of up to 83.21%[10]. Research [11] conducted congestion prediction using Heterogeneous Vehicular Networks (HetVNETs) data. This research compares the performance of the Naïve Bayes classification algorithm, Support Vector Machine (SVM), K Nearest Neighbor (KNN), and Random Forest.

The Naïve Bayes algorithm gets the highest results with an average accuracy of 91.87%[11].

The application of Simple Kriging interpolation is often used in map prediction. Research [12] predicts using Generalized Space-Time Autoregressive (GSTAR) and simple kriging on air pollution in Bandung City.

Preprocessing is done to get the average value of air pollution every year. Then weight the data and estimate the best parameters. Then predict the data using GSTAR and map prediction using Simple Kriging interpolation[12].

Based on the Naïve Bayes and ANN studies above, it can be concluded that no research directly compares the performance of the two algorithms on imbalanced data and visualizes the congestion map. The application of Simple Kriging interpolation can be utilized to predict congestion maps. Therefore, this research will compare the performance of Naïve Bayes and ANN methods to determine the optimal congestion classification. The two methods will visualize the congestion map using Simple Kriging interpolation. The map visualization will show the level of congestion based on the classification produced by the two methods. The expected goal is to find the best algorithm of the two. Another expected goal of this research is to provide information on congestion in Bandung City.

2. RESEARCH METHODOLOGY

2.1 Research Phases

The system will be built to classify traffic congestion using the Naïve Bayes method and Artificial Neural Network.

Both methods have three scenarios: using data without Oversampling techniques, data with Oversampling techniques, and data with Oversampling techniques and tuning hyperparameters. Then it will perform map classification and road prediction using Simple Kriging interpolation. The following system design flow chart can be seen in Figure 1.

Figure 1. Work Flowchart Based on Figure 1, the following is an explanation of the process above.

2.2 Dataset

This research uses traffic counting data obtained from ATCS Bandung City. This data is a time series type with daily details in April 2022. The number of attributes obtained is 12 attributes consisting of Road Name, Lane, Time, Motorbike, Car, Bus/Truck, Total, Headway(S), GAP(S), 85 P Speed (Km/H), Avg. Speed (Km/H), Occupancy (%). Headway(S) is the time between a vehicle and the vehicle behind it when crossing an intersection.

GAP(S) is the distance between the rear bumper of the front car and the front bumper of the car behind it. 85 P Speed (Km/H) is the average speed of 85% of vehicles on the road without being affected by traffic flow.

Occupancy (%) is the volume of vehicles on the road. In this data, attributes were added based on direct observation using Google Maps. Then the "Day" and "Time" attributes were added, which were broken down based on the

"Time" attribute in the ATCS data. The following is the dataset obtained from ATCS in Table 1.

(3)

Table 1. Raw Data STATISTICS DATA

Location : UJUNG BERUNG Lanes : Turn Left

Periode : 01-04-2022, 07:00:00 s.d 17:00:00 No Time Motorbik

e Car Bus/truck Total Headway (s)

GAP (s)

85 P speed (Km/H)

Avg.

Speed (Km/H)

Occupanc y (%)

1 01- 04- 2022 16:0

0

422 100 87 609 2.02 2.85 73.25 50.44 4.42

2 01- 04- 2022 12:0

0

245 106 87 438 2.19 5.13 68.25 50.25 9.93

3 01- 04- 2022 07:0

0

325 32 40 397 4.22 8.67 80.75 62.31 7.68

The attributes used to build the model were added based on this data. The added attributes are "Latitude",

"Longitude", "Time", and "Day" can be seen in Table 2.

Table 2. Processed data Road

Name

Latitude Longitude Lanes Time Day Date Motor

bike

Car

SP.

UJUNG BERUNG

-6.914068 107.699412 TURN LEFT

EVENI NG

FRIDAY 01-04-2022 16:00:00

422 100

SP.

UJUNG BERUNG

-6.914068 107.699412 TURN LEFT

AFTER NOON

FRIDAY 01-04-2022 12:00:00

245 106

SP.

UJUNG BERUNG

-6.914068 107.699412 TURN LEFT

MORN ING

FRIDAY 01-04-2022 07:00:00

325 32

Table 2 and Table 3 are the datasets to be used, and Table 3 is a continuation of Table 2. In Table 3, the attributes added are "Width Road (m)" and "Queue Length (m)".

Table 3. Processed Data Bus/Truck Total Headway(s) GAP(s) 85 P

Speed (Km/H)

Avg. Speed (Km/H)

Occupancy (%)

Width Road

(m)

Queue Length (m)

87 609 2.02 2.85 73.25 50.44 4.42 6 20

87 438 2.19 5.13 68.25 50.25 9.93 6 25

40 397 4.22 8.67 80.75 62.31 7.68 6 25

2.3 Preprocessing

Data preprocessing is a multifaceted discipline that includes data preparation, coupled with data integration, cleansing, normalization, and transformation. Data reduction tasks include feature selection, instance selection, and discretization and resampling techniques to handle imbalanced data[13]. In machine learning, this activity is essential to ensure that the data is formatted and the information it contains can be interpreted[14]. In this research, the preprocessing process will be done by labelling data for categorical attributes and classifying the congestion level into four types. Then perform normalization for ANN model building.

2.4 Data Labeling

In the dataset used, there are still categorical attributes that will be used to create a model, namely the "Day" and

"Time" attributes. Therefore, a numerical value is given for each value in the categorical attribute.

(4)

Table 4. Data Labelling

Attribute Value Numeric Value

Day MONDAY, TUESDAY, WEDNESDAY, THURSDAY,

FRIDAY, SATURDAY, SUNDAY

0,1,2,3,4,5,6

Time MORNING, AFTERNOON, EVENING 0,1,2

In dividing the level of congestion, this study classifies the attributes of "Occupancy" based on the rules of the Indonesian Road Capacity Manual 1997[15]. The "Occupancy" attribute has the highest value of up to 86%, so this study only results in the classification of congestion into four categories.

Table 5. Congestion Rate Data Labeling

Attribute Value

Occupancy Occupancy < 60% 60% < Occupancy ≤ 70%

70% < Occupancy

≤ 80% 80% < Occupancy

≤ 90%

Congestion Level Free Flow Steady Flow Stable Controlled Unstable

Label 0 1 2 3

2.5 Normalization

In the construction of the Artificial Neural Network model, Normalization is necessary. Normalization is the process of grouping data attributes to eliminate and reduce data redundancy to ensure that the data is of good quality[16]. Normalization also improves network performance and reduces training process errors. One of the normalizations that can be done is Min-Max Normalization with the formula[17]:

𝑋𝑛= 𝑋𝑖−𝑀𝑖

𝑀𝑎𝑥𝑋−𝑀𝑖𝑛𝑋 (1)

Xn is n-Normalized Data, Xi is i-Normalized Data, MaxX is the maximum value of X, and MinX is the minimum value of X[17].

2.6 Random Oversampling

This research uses the Random Oversampling technique to handle imbalanced data. Imbalance data is a state of unbalanced data between classes in the data. Unbalanced data conditions are a problem for classification because classification techniques tend to predict the data class with the most data compared to the minority class when training. Random oversampling (ROS) is the random addition of data in the minority class to the training data.

This addition process is repeated until the data in the minority class equals the amount of data in the majority class[18].

2.7 Classification 2.7.1 Naïve Bayes

In this process, also known as the training process, the Naive Bayes model is tested using a confusion matrix to obtain performance. Naive Bayes is an additional algorithm that supports neural networks and can effectively optimize vehicle acceleration, deceleration, and lane changes[19] The Naive Bayes model contains a target variable (as the model output) and several variable features (as the model input). Suppose T is the state or class of the target variable, and suppose the vector X = (x1, x2, ..., xn) is the state of n features. The probability of T against X must be calculated first to derive the value of T based on X. The following is the calculation process based on Bayes' theorem. The conditional probability of T against X is expressed as[20]:

𝑝(𝑇|𝑋) = 𝑝(𝑋|𝑇)𝑝(𝑇)

𝑝(𝑋) (2)

The explanation is that p(X) and p(T) are constants that can be derived directly from the data, and p(X|T) is the remainder to be resolved. Based on the independence assumption of Naive Bayes, p(X|T) can be factored into[20]:

𝑝(𝑋|𝑇) = 𝑝(𝑥1𝑥2, … , 𝑥𝑛|𝑇) = ∏𝑛𝑖=1𝑝(𝑥𝑖|𝑇) (3) By combining the previous two equations, it can be formulated as follows[20]:

𝑝(𝑇|𝑋) = 𝑝(𝑇)

𝑝(𝑋)𝑛𝑖=1𝑝(𝑥𝑖|𝑇) (4)

The explanation is that p(T), p(X), and p(𝑥𝑖|T) are the parameters of the Naive Bayes model. In this study, these parameters are examined directly from the training data. Therefore, the conditional distribution of T against X can be calculated using equation (5). The value of the target variable T results from classification as the state T with the highest probability. The result of this stage is a Naïve Bayes model that can be used for classification[20]:

(5)

2.7.2 Artificial Neural Network

The architecture of an Artificial Neural Network (ANN) process is based on human neural networks. Nerve cells (neurons) are composed of additional activation and output functions. Neurons in artificial neural networks are generally placed in layers consisting of the input layer, hidden layer, and output layer[21]. ANN performance can be improved using non-linear functions and speed up training that a non-linear activation function must activate.

This study uses Rectified Linear Unit (ReLU). The ReLU function is formulated as follows[22]:

𝑅𝑒𝐿𝑈(𝑥) = { 𝑥, 𝑖𝑓 𝑥 > 0

0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 (5)

Classification using deep learning usually uses the softmax function as the classification function (final layer). The formula of the softmax function S(x) is as follows[22]:

𝑆(𝑥) = 1

1+exp (−𝑥) (6)

ReLU is usually used as the activation function of the neural network, and softmax is used as the classification function. The network then checks the weighting parameters of the neural network[23].

2.8 Simple Kriging

Simple kriging is a variant of the kriging method. It is a geostatistical method for estimating a point or block based on an estimated value. The Simple Kriging method is one of the interpolation methods often used to make predictions using maps. This method has the simplest mathematical model. The reasonable assumption for applying Simple Kriging is to know the variance and covariance of the function values. The formula of Simple Kriging uses the Z estimator at position s (Ẑ(s)) can be seen as follows[12]:

𝑍(𝑠) = 𝑌(𝑠) + 𝑚 (7)

The value of 𝑚 is the mean or average value and the estimator of the value of 𝑌(𝑠) is[12]:

𝑍̂(𝑠) = ∑𝑛𝑖=1𝜆𝑖𝑍(𝑠𝑖) (8)

𝑍(𝑠𝑖) is the attribute value at the point 𝑆𝑖 to (neighbour Z(s)), 𝜆 is the Lagrange multiplier, which is a method to find the maximum and minimum values of a function[12].

2.9 Performance Analysis

This stage is to determine the effectiveness of each method used to evaluate performance measured using a confusion matrix. The confusion matrix measures the performance of classification results when recognizing rows of data from different classes. Data that deserves a correct value will get TP (True-Positive) and TN (True- Negative) values. In contrast, data that deserves a wrong value will get FP (False-Positive) and FN (False- Negative) values[21]. Confusion matrix can be seen in Table 6.

Table 6. Confussion Matrix

True Label Positive

TP FP

Negative

FN TN

Positive Negative Prediction Label

Classification performance can be calculated using several formulas. There are several metrics to evaluate the performance of multiclass classification models [24]:

a. Accuracy is the percentage of total records that are classified correctly, which can be defined by the following formula[24]:

𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝑇𝑃+𝑇𝑁

𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁 (9)

b. Precision is the ratio of records correctly identified as favorable to the total records identified as positive with the following formula[24]:

𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 𝑇𝑃

𝑇𝑃+𝐹𝑃 (10)

(6)

c. Recall is the percentage ratio of records correctly identified as favorable to the total actual positives with the following formula[24]:

𝑅𝑒𝑐𝑎𝑙𝑙 = 𝑇𝑃

𝑇𝑃+𝐹𝑁 (11)

3. RESULT DAN DISCUSSION

This chapter will conduct tests to compare the performance of Naïve Bayes and Artificial Neural Network (ANN) methods. Tests were conducted with three scenarios to provide a more detailed comparison. The performance of the methods outperformed each other in certain scenarios. In the end, the ANN method outperformed the Naïve Bayes method. After getting the performance of each method, then create a congestion classification map using both methods. The following are the steps and explanations carried out in this research.

3.1 Data Collection

The dataset used in this study is processed data with a total of 3804 data divided into four labels, namely: "Free Flow", "Steady Flow", "Stable Controlled", and "Unstable". Training data is taken as much as 70% of the dataset to train the method, while test data is taken as much as 30% to test the model built. The comparison graph of the amount of data can be seen in Figure 2.

Figure 2. Comparison of Total Congestion Data

It can be seen in Figure 2 that the data is imbalanced. To avoid overfitting, the Random Oversampling technique is used by randomly adding data in the minority class to the training data described in the research[18].

The graph after performing Random Oversampling can be seen in Figure 3.

Figure 3. Random Oversampling

After balancing the data, perform hyperparameter tuning to get the best parameters and high accuracy in the model. The parameters used can be seen in Table 7.

Table 7. Hyperparameter Tuning

Model Parameter Parameter Value Best Parameter

Naïve Bayes var_smoothing 0, -9 0.0002848035868435802

ANN batchSize 50,100,128,256 50

epochs 4,8,10,32,64,128 128

optimizer SGD, RMSprop, Adam SGD

activation relu, softmax, tanh ReLU

2050 444

155 13

0 500 1000 1500 2000 2500

Total

Unstable Stable Controlled Steady Flow Free Flow

2050 2050 2050 2050

0 500 1000 1500 2000 2500

Total

Unstable Stable Controlled Steady Flow Free Flow

(7)

3.2 Implementation

This section will compare the performance of the two models. Performance comparison is made through several trials on each machine learning model. The accuracy, precision, and recall values are taken from the average weight value of each model. The Naïve Bayes model used is the GaussianNB type. This classification is very commonly used and good for classification. Furthermore, in performing Hyperparameter Tuning, the parameter

"var_smoothing" is used using the GridSearch algorithm to find the best estimator. The Artificial Neural Network model uses three layers. The use of this layer is done with three scenarios. The first layer is the input layer totalling 11. Furthermore, it uses a hidden layer with a total of 3 layers. The first hidden layer has 23 layers with ReLU activation. The second hidden layer has eight layers with ReLU activation. Finally, the output layer uses four layers with softmax activation. The model performance comparison can be seen in Table 8, Table 9, and Table 10.

Table 8. Model Comparison Before Oversampling

Model Accuracy Precision Recall

Naïve Bayes 87.74 91.30 87.74

ANN 75.83 57.50 75.83

Table 8 is the first scenario of classification without Oversampling. The results are that the Naïve Bayes model outperforms the ANN model with 87.74% accuracy, 91.30% precision, and 87.74% recall. The ANN model only has 75.83% accuracy, 57.50% precision, and 75.83% recall. In this condition, it can be seen that the Naïve Bayes model is very good at classification. This is because the model only performs classification for the majority class, so the ANN method is less optimal in building classification models.

Table 9. Model Comparison After Oversampling

Model Accuracy Precision Recall

Naïve Bayes 86.55 86.92 86.55

ANN 87.21 87.25 87.21

Table 9 is the second scenario using Oversampling data. The ANN and Naïve Bayes models almost have the same accuracy, precision, and recall values. This happens because of the correct division ratio for all classes, so the model is more effective in classification. The ANN model has improved and is more stable because the data is balanced. The Naïve Bayes model experienced a slight decrease in performance, but the accuracy, precision, and recall values were relatively stable.

Table 10. Model Comparison After Oversampling and Hyperparameter Tuning

Model Accuracy Precision Recall

Naïve Bayes 90.44 91.18 90.44

ANN 93.04 93.44 93.04

Table 10 is a comparison of the Hyperparameter Tuning results. The model performs training using the best parameters. With this training process, the model will produce a more accurate map. The result is that the ANN model produces better values than the Naïve Bayes model. It can be seen that the ANN model has higher accuracy, precision, and recall compared to the Naïve Bayes model.

3.3 Classification Map

In making maps, this research is based on covariance and compares three parameters, namely Spherical, Exponential, and Gaussian. Map generation will be done using the best parameters with the smallest RMSE value. The following is the comparison in Table 11:

Table 11. Comparison Variogram Type

Model Parameter RMSE

ANN Spherical 0.9746

Exponential 0.9752 Gaussian 0.9771 Naïve Bayes Spherical 0.9381 Exponential 0.9396 Gaussian 0.9479

Based on the comparison in Table 11, the ANN and Naïve Bayes models will use the Spherical. The congestion map of Naïve Bayes, ANN, and Simple Kriging interpolation classification results is shown in Figures 4 and Figure 5.

(8)

Figure 4. ANN Congestion Map Classification

Figure 5. Naive Bayes Congestion Classification Map

The level of congestion is divided into four categories and marked with colors. Green indicates "Free Flow", yellow indicates "Steady Flow", orange indicates "Stable Controlled", red indicates "Unstable". There are differences in the two maps because they use different classification models and accuracy levels. The most obvious difference is around the SP. M. Toha road. The ANN classification map is dominated by orange color which means

"Stable Controlled", while the Naïve Bayes classification map is dominated by yellow color which means "Steady Flow”. This is influenced by the accuracy and precision of the classification results. The RMSE value also affects the prediction of congestion maps. The accuracy and precision of the ANN method have almost the same value as the Naïve Bayes method. Both maps can make almost the same predictions.

In creating the maps, both methods use the same attributes. Roads that are not marked with intersections (SP) can be predicted because they use Simple Kriging interpolation. Road prediction is done by calculating the weighting using 5 neighbors. The level of congestion in Bandung City shows that it tends to be very congested in the southern area of Bandung, precisely around SP. M. Toha to SP. Samsat. In the eastern and northern areas of Bandung City, congestion conditions tend to be smoother than the southern areas of Bandung City. The interpolation results are displayed visually in the form of a map using ArcMap application software.

The prediction results from ANN, Naïve Bayes, and Simple Kriging are pretty good and displayed in the form of a map. The developed ANN classification model has an accuracy of 93%, and Naïve Bayes has an accuracy of 90%. When compared to previous studies that only make predictions [6], [8] and classification only [7], this

(9)

study also performs map visualization. In research [12], map making is only for regional predictions and not based on roads. This research produces output as a prediction map for congestion classification.

4. CONCLUSION

The use of classification for imbalanced data affects the results of the method used. Model performance without using the Random Oversampling technique gives poor results with unstable comparison values of accuracy, precision, and recall values. The Naïve Bayes method gets an accuracy of up to 87.74%, while ANN only gets an accuracy of up to 75.83%. The classification method gets stable results when the Random Oversampling technique is used. Although the Naïve Bayes method experienced a decrease in accuracy to 86.55%, it produced a more stable method with precision and recall values that were not much different. The ANN method experienced an increase in accuracy to 87.21% when using the Random Oversampling technique. When performing hyperparameter tuning, the method can produce higher accuracy. ANN performance is better than Naïve Bayes, but in training, the model takes longer than Naïve Bayes. The results of this study show that the ANN algorithm is better than the Naïve Bayes algorithm, with accuracy, precision, and recall rates reaching 93%. Prediction maps using Simple Kriging are obtained using the Naive Bayes and Artificial Neural Network (ANN) methods.

Predictions for ANN congestion classification has an RMSE value of 0.9746, while Naïve Bayes congestion classification has an RMSE value of 0.9381. Both methods produce almost the same map with a level of accuracy and RMSE value that is not much different. Although the ANN method produces higher accuracy than Naïve Bayes, the RMSE value of the Naïve Bayes method is better. With a smaller RMSE value, the Naïve Bayes map has a small error compared to the ANN method. The resulting classification map shows that in April 2022, the southern area of Bandung City experienced the highest congestion compared to the northern, western, and southern areas. This is shown on the maps generated by both methods. The southern area of Bandung is primarily red, indicating the highest congestion level.

ACKNOWLEDGMENT

The author would like to thank Telkom University, Bandung City Transportation Agency and Area Traffic Control System (ATCS) Bandung City for supporting facilities and infrastructure in providing information and data so that this research can be completed.

REFERENCES

[1] N. E. Neviana and D. K. Soedarsono, “KEGIATAN KOMUNIKASI ATCS DALAM MENGURANGI PELANGGARAN LALU LINTAS DI KOTA BANDUNG ( Studi Deskriptif ATCS Kota Bandung Dalam Mengurangi Pelanggaran Lalu Lintas Menggunakan Pengeras Suara di Persimpangan ),” Proceeding of International Conference on Communication, Culture and Media Studies (CCCMS), vol. 7, no. 2, pp. 6969–6983, 2020.

[2] S. Cerdas, B. Konsep, F. Logic, U. Evaluasi, and K. Karyawan, “Email : roy.mubarak@eresha.ac.id,” vol. XI, no. 02, pp.

36–40, 2017.

[3] Y. Xing, X. Ban, X. Liu, and Q. Shen, “Large-scale traffic congestion prediction based on the symmetric extreme learning machine cluster fast learning method,” Symmetry (Basel), vol. 11, no. 6, pp. 1–19, 2019, doi: 10.3390/sym11060730.

[4] G. D. Ramady and R. G. Wowiling, “Analisa Prediksi Laju Kendaraan Menggunakan Metode Linear Regresion Sebagai Indikator Tingkat Kemacetan,” Jurnal Sekolah Tinggi Teknologi Mandala, vol. 12, no. 2, pp. 22–28, 2017.

[5] Y. Liu and H. Wu, “Prediction of road traffic congestion based on random forest,” Proceedings - 2017 10th International Symposium on Computational Intelligence and Design, ISCID 2017, vol. 2, pp. 361–364, 2018, doi:

10.1109/ISCID.2017.216.

[6] R. More, A. Mugal, S. Rajgure, R. B. Adhao, and V. K. Pachghare, “Road traffic prediction and congestion control using Artificial Neural Networks,” International Conference on Computing, Analytics and Security Trends, CAST 2016, pp. 52–

57, 2017, doi: 10.1109/CAST.2016.7914939.

[7] M. Taamneh, S. Taamneh, and S. Alkheder, “Clustering-based classification of road traffic accidents using hierarchical clustering and artificial neural networks,” International Journal of Injury Control and Safety Promotion, vol. 24, no. 3, pp. 388–395, 2017, doi: 10.1080/17457300.2016.1224902.

[8] D. Dauletbak and J. Woo, “Big data analysis and prediction of traffic in Los Angeles,” KSII Transactions on Internet and Information Systems, vol. 14, no. 2, pp. 841–854, 2020, doi: 10.3837/tiis.2020.02.021.

[9] G. R. Septianto, F. F. Mukti, M. Nasrun, and A. A. Gozali, “Jakarta congestion mapping and classification from twitter data extraction using tokenization and naïve bayes classifier,” Proceedings - APMediaCast: 2015 Asia Pacific Conference on Multimedia and Broadcasting, no. April, pp. 14–19, 2015, doi: 10.1109/APMediaCast.2015.7210266.

[10] H. al Najada and I. Mahgoub, “Big vehicular traffic Data mining: Towards accident and congestion prevention,” 2016 International Wireless Communications and Mobile Computing Conference, IWCMC 2016, pp. 256–261, 2016, doi:

10.1109/IWCMC.2016.7577067.

[11] F. Falahatraftar, S. Pierre, and S. Chamberland, “A Centralized and Dynamic Network Congestion Classification Approach for Heterogeneous Vehicular Networks,” IEEE Access, vol. 9, pp. 122284–122298, 2021, doi:

10.1109/ACCESS.2021.3108425.

[12] S. S. Prasetiyowati, Y. Sibaroni, and S. Carolina, “Prediction and Mapping of Air Pollution in Bandung Using Generalized Space Time Autoregressive and Simple Kriging,” 2020.

(10)

[13] J. Luengo, D. García-Gil, S. Ramírez-Gallego, S. García, and F. Herrera, “Big Data Preprocessing Enabling Smart Data,”

2020.

[14] R. R. Rerung, “Penerapan Data Mining dengan Memanfaatkan Metode Association Rule untuk Promosi Produk,” Jurnal Teknologi Rekayasa, vol. 3, no. 1, p. 89, 2018, doi: 10.31544/jtera.v3.i1.2018.89-98.

[15] H. Hardiani, “Analisis Derajat Kejenuhan dan Biaya Kemacetan Pada Ruas Jalan Utama di Kota Jambi,” Jurnal Perspektif Pembiayaan dan Pembangunan Daerah , vol. 2, no. 4, 2016.

[16] D. A. Nasution, H. H. Khotimah, and N. Chamidah, “Perbandingan Normalisasi Data untuk Klasifikasi Wine Menggunakan Algoritma K-NN,” Computer Engineering, Science and System Journal, vol. 4, no. 1, p. 78, 2019, doi:

10.24114/cess.v4i1.11458.

[17] A. Pranolo, Universitas Ahmad Dahlan, Institute of Electrical and Electronics Engineers. Indonesia Section, and Institute of Electrical and Electronics Engineers, 2018 International Symposium on Advanced Intelligent Informatics (SAIN) :

“Revolutionize Intelligent Informatics Spectrum for Humanity” : proceeding : August 29-30, 2018, Yogyakarta, Indonesia.

[18] R. Dwi Fitriani, H. Yasin, D. Statistika, and F. Sains dan Matematika, “PENANGANAN KLASIFIKASI KELAS DATA TIDAK SEIMBANG DENGAN RANDOM OVERSAMPLING PADA NAIVE BAYES (Studi Kasus: Status Peserta KB IUD di Kabupaten Kendal),” vol. 10, no. 1, pp. 11–20, 2021.

[19] H. Zhang, J. Wei, X. Gao, and J. Hu, “The study of traffic flow model based on cellular automata and Naive Bayes,”

International Journal of Modern Physics C, vol. 30, no. 5, pp. 1–14, 2019, doi: 10.1142/S0129183119500347.

[20] G. Wang and J. Kim, “The prediction of traffic congestion and incident on urban road networks using Naive Bayes classifier,” ATRF 2016 - Australasian Transport Research Forum 2016, Proceedings, no. November, pp. 1–14, 2016.

[21] S. Euis, U. Yuyun, and v. Apriade, “Penerapan Algoritma Artificial Neural Network untuk Klasifikasi Opini Publik Terhadap Covid-19,” Generation Journal, vol. 5, no. 2, pp. 109–118, Jul. 2021, doi: 10.29407/gj.v5i2.16125.

[22] S. H. Wang, K. Muhammad, J. Hong, A. K. Sangaiah, and Y. D. Zhang, “Alcoholism identification via convolutional neural network based on parametric ReLU, dropout, and batch normalization,” Neural Computing and Applications, vol.

32, no. 3, pp. 665–680, Feb. 2020, doi: 10.1007/s00521-018-3924-0.

[23] A. F. Agarap, “Deep Learning using Rectified Linear Units (ReLU),” Mar. 2018, [Online]. Available:

http://arxiv.org/abs/1803.08375

[24] A. Kulkarni, D. Chong, and F. A. Batarseh, “Foundations of data imbalance and solutions for a data democracy,” in Data Democracy: At the Nexus of Artificial Intelligence, Software Development, and Knowledge Engineering, Elsevier, 2020, pp. 83–106. doi: 10.1016/B978-0-12-818366-3.00005-8.

Referensi

Dokumen terkait

DECISION 21 October 2016 Summary Substance Taratek JP Application code APP202715 Application type To import or manufacture for release any hazardous substance under Section 28

Winnunga Nimmityjah Aboriginal Health and Community Services, Australian Capital Territory We read with interest the recent article entitled ‘Holistic primary health care for