WEATHER PREDICTION IN KOTA KINABALU USING LINEAR REGRESSIONS WITH MULTIPLE VARIABLES

50 01-009

WEATHER PREDICTION IN KOTA KINABALU USING LINEAR

51 map training examples. Thus, a model can be constructed and utilised to forecast the weather based on observed weather trends in the past.

The numerical weather prediction is a commonly used forecasting model. Generally, it analyses current meteorological conditions and processes them in order to create a model for weather prediction. Several machine learning methods have been used in a variety of applications, including rainfall, weather, storms, solar radiation, and flood prediction. Weather forecasting is a subset of predictive mining, which focuses on data analysis, database construction, and predicting the characteristics of anonymous data.

The main objective of this paper is to use a normal equation modelling order to predict the weather in Kota Kinabalu based on temperature, dew point, relative humidity, wind speed, and atmospheric pressure using linear regression techniques. The main contribution is the formulation of an effective weather forecast model based on linear regression techniques. To overcome the constraints of weather prediction, this study compares the normal equation model’s hypothesis to the gradient descent model to get a better understanding of the models’ effectiveness.

Despite the fact that gradient descent is the most widely used linear regression method, this work uses the normal equation method to more precisely forecast future weather conditions. Weather forecasting will be more accurate and beneficial in everyday activities.

RELATED WORK

This section examines the current weather prediction methods that are accessible in the literature and are based on the different ideas of different researchers. Other weather forecast methods based on the linear regression technique are given, with different weather characteristics has been used. For each method, the experimental outcome is computed for various parameters. Sharin [3] performed a study on temperature prediction models using various algorithms like linear regression, polynomial regression, isotonic regression and support vector regressor. Holmstrom et al. [4] suggested a linear regression model and a variation of a functional linear regression model for determining the greatest and lowest temperatures for the next seven days based on the previous couple of days’ data. In [5], the main machine learning predictors used for rain prediction was a rule-based approach based on a decision tree model. Clusterwise linear regression technique was conducted for predicting monthly rainfall by Bagirov et al. [6]. The method is a combination of clustering and regression techniques.

The most basic and often used prediction model for analysis is linear regression. Regression estimates are often used to explain data and provide light on the connection between one or more independent and dependent variables. Basically, linear regression determines the best-fit via the points. The regression line is the best-fit line through the points.

Depending on the data, the line may be straight or curved. The best-fit line may potentially be a quadratic or polynomial, providing a more accurate response to the user's queries. The normal equation method and gradient descent equation are two optimization methods utilized in this study.

The accuracy of the model for both optimization algorithms are determined using the root mean square error.

METHODOLOGY & PROPOSED WORK

The basic framework of a suggested weather prediction model is illustrated in Figure 1. The framework consists of data collection, data selection, data processing, model training, model

52 assessment, and display of outcomes. The weather information was acquired from the Weather Underground website https://www.wunderground.com, which uses meteorological data from local weather stations[7]. This study utilised the data from the years 2020 and 2021. The data was retrieved for the city Kota Kinabalu (the capital of Sabah). The data include average monthly temperature (°C), average dew point (°C), average relative humidity (%), average wind speed in km per hour (km/h) and average atmospheric pressure in millibars (mbar).

Figure 1. The framework of the weather prediction model

Two algorithms, namely gradient descent and normal equation are used to compare the efficiency when applied on the original dataset. The proposed model for temperature is shown in Equation 1:

AvgTemp = a + b(AvgDew) + c(AvgHum) + d(AvgPres) + e(AvgVis) + f(AvgWind) (1) where a, b, c, d, e, and f are the learning parameters. The optimization algorithm learns these parameters with the help of the training data fed to it.

Three types of weather parameters are predicted: temperature, humidity and dew-point.

Temperature is a unit of measurement for the degree of heat or coolness. The most commonly used temperature units are Celsius and Fahrenheit. Humidity refers to the amount of water vapour in the atmosphere. It is a relative quantity. The dew point is known as the temperature at which a particular volume of air at a certain atmospheric pressure becomes saturated with wáter vapour, producing condensation and wáter droplets begin to form. Table I shows sample data from the training set, these dataset are fed into the model and the parameters are updated such that the cost function is minimized.

Table 1. Sample of Training data Month

(2020)

Avg Temp (F˚)

Avg Dew (F˚)

Avg Hum (%)

Avg Pres (mbar)

Avg Vis (km)

Avg Wind (km/h)

Jan 75.69 75.69 81.52 1010.2 9.7 4.41

Feb 74.69 74.69 78.50 1011.3 9.8 4.54

Mar 75.79 75.79 80.48 1010.2 9.7 4.65

Apr 77.49 77.49 81.05 1010.1 9.7 4.99

May 78.47 78.47 84.35 1009.3 9.3 4.95

Jun 76.56 76.56 86.61 1009.1 8.9 4.88

Jul 76.25 76.25 87.62 1008.5 8.6 5.05

Data Acquisition

Data Selection Data

Processing

Model Training Model

Evaluation Display Final

Result

Aug 76.15 76.15 83.29 1009.0 9.2 4.9

Sep 76.11 76.11 85.71 1009.1 8.8 5.15

Oct 75.22 75.22 85.01 1008.6 9.2 5.57

Nov 76.24 76.24 86.61 1009.1 9.1 4.77

Dec 75.84 75.84 86.08 1008.3 9.4 4.94

The linear regression hypothesis function can be expressed in Equation 2:

ℎ_𝜃(𝑥) = 𝜃₀𝑥₀ + 𝜃₁𝑥₁+ … + 𝜃_𝑛𝑥_𝑛. (2) where 𝑥₀, 𝑥₁,…, 𝑥_𝑛 are the independent variables and 𝜃₀, 𝜃₁, … 𝜃_𝑛 are the optimized parameters for each independent variable.

The least square cost function used in the algorithm of linear-regression method is given by Equation 3.

𝐽(𝜃₀, 𝜃₁, . . 𝜃_𝑛) = ¹

2𝑚∑^𝑚_𝑖=1 [ℎ_𝜃(𝑥^(𝑖)) − 𝑦^(𝑖)]² (3) where m is the number of training samples, 𝑥^(𝑖) is the input value of i^th training sample and 𝑦^(𝑖) is the expected result of i^th sample.

The optimization techniques used in this study are normal equation and gradient descent. Normal equation is an analytical solution to the linear regression problem that uses the least-squares cost function. Hence, the normal equation formula is shown in Equation 4:

𝜃= (𝑋^𝑇𝑋)⁻¹𝑋^𝑇𝑦. (4) where 𝑋 = (1𝑥₁⁽¹⁾ … 𝑥_𝑛⁽¹⁾1𝑥₁⁽²⁾ … 𝑥_𝑛⁽²⁾ ⋮ ⋮ ⋱ ⋮ 1𝑥₁^(𝑚) … 𝑥_𝑛^(𝑚)) and 𝑦 = (𝑦⁽¹⁾𝑦⁽²⁾ ⋮ 𝑦^(𝑚)).

𝑥_𝑗^(𝑖) is the value of j^th feature in i^th training sample.

Equation 5 shows the Gradient Descent formula:

𝜃_𝑗: = 𝜃_𝑗 − 𝛼 ¹

𝑚∑^𝑚_𝑖=1 [ℎ_𝜃(𝑥^(𝑖)) − 𝑦^(𝑖)]𝑥_𝑗^(𝑖). (5) where 𝛼 is learning rate that affects the convergence of the algorithm to an optimal local solution.

The range is between 0 and 1.

The efficiency of all methods are evaluated using Root Mean Square Error (RMSE) [8-9]. RMSE measures the variability of the errors between the observed value and the true value and provides an indication of the model’s ability to predict the output with new input data [10]. RMSE is a good measure of accuracy, when comparing prediction errors of different models for a particular variable as it is scale-dependent [11-12]. The formula for RMSE, is given by Equation 6:

𝑅𝑀𝑆𝐸 = √¹

𝑚∑^𝑚_𝑖=1 [ℎ_𝜃(𝑥^(𝑖)) − 𝑦^(𝑖)]². (6)

54 In general, the lower value of RMSE implies higher accuracy of a regression model.

RESULTS

Matlab is used for data training and assessment. The learning rate and number of iterations of the gradient descent algorithm are set as 0.3 and 50000 respectively. Three factors in particular are computed using the linear-regression model’s hypothesis: temperature, humidity, and dew point. The time required for each algorithm is taken into account for each factor. For each algorithm, the root mean square error is also computed. Table 2 illustrates the findings obtained for the entity temperature. It is clear that the normal equation method produces answers that are extremely close to the real outcome. The inaccuracy generated by the gradient descent technique is extremely substantial when compared to the error generated by the normal equation approach. In addition, the time required for the gradient descent algorithm to complete the task is longer than the time required by normal equation.

Table 2. Temperature

Parameters Gradient Descent Normal Descent

a 82.30583 -16.72338

b 0.88995 1.01777

c -1.17833 -0.36773

d 0.05386 0.05190

e -0.05610 -0.17593

f 0.11068 0.13683

RMSE 25.77815 0.15220

Time Taken 0.64311s 0.00336s

Humidity is important in forecasting rainfall and droplets in the atmosphere. It also has a temperature. The temperature rises when the relative humidity drops, and vice versa. The calculated values of each parameter are shown in Table 3. The results obtained by the normal equation are extremely close to the real humidity outcome as the dependent variable. Similarly, the overall processing time required by the gradient descent to complete the sample training process is also greater than the one required by the normal equation even in the case of humidity

Table 3. Humidity

Parameters Gradient Descent Normal Descent

a 82.10750 133.18894

b 2.21314 2.53100

c -3.26850 -2.51645

d -0.03212 -0.03095

e -0.21823 -0.68443

f 0.13335 0.16485

RMSE 134.529833 4.14847

Time Taken 0.72239 0.00323

55 The dew point is also an important aspect of weather forecasts. It indicates the amount of wáter vapor in the air at a certain temperature. The higher the dew point, the higher the moisture contains the air at a given temperature. The results obtained for the entity dew-point are shown in Table 4. By comparing the root mean square error, It is clear that the gradient descent method is unsuitable for predicting dew-point, whereas results produced by the normal equation method are highly reliable.

Moreover, it turns out that the normal equation takes less time to compute the parameters than the gradient descent.

Table 4. Dew-point

Parameters Gradient Descent Normal Descent

a 75.97917 39.18630

b 1.22732 0.94493

c 1.10032 0.34339

d −0.07115 −0.06856

e 0.04292 0.13460

f −0.11804 −0.14593

RMSE 118.95769 0.14654

Time taken 0.66567 0.00330

CONCLUSION

Efficient and accurate weather prediction are vitally important in numerous areas of society in Kota Kinabalu, particularly agriculture, water management, aquacultural and tourism that could drive the economy of the city. In this study, temperature dew point (°C), relative humidity, wind speed and atmospheric pressure are the variables that are utilized in this dataset. This study suggests and proposes an efficient and accurate weather prediction and forecasting model using linear regression principles and the normal equation model. The normal equation is a highly effective weather prediction model that may be used to produce reliable weather forecasts utilising the entities temperature, humidity, and dew point. The research effort could be expanded to a larger region and network, which is particularly beneficial to farmers in rural agricultural areas in Sabah.

Different machine learning strategies and input features that have effect on the analysis such as precipitation, solar radiation, geographical variables (latitude, longitude, and altitude), cloudiness, and carbon dioxide emissions, can also be taken into account to increase weather prediction accuracy.

REFERENCES

[1] Djamila, H., Ming, C.C. and Kumaresan, S., Estimation of exterior vertical daylight for the humid tropic of Kota Kinabalu city in East Malaysia. Renew. Energy, 36(1) (2011), 9–15.

[2] Vun Teong, K, Sukarno, K., Hian Wui Chang, J., Pien Chee, F., Mun Ho, C., and Dayou, J. The monsoon effect on rainfall and solar radiation in Kota Kinabalu. Transactions of Science & Tech. 4(4) (2017), 460-465.

[3] Shafin, A. Machine learning approach to forecast average weather temperature of Bangladesh," Global Journal of Computer Science and Technology: D Neural & Artificial Intelligence, 19 (2019), 39–48.

[4] Holmstrom, M and Liu, D. Z.. Machine learning applied to weather forecasting (2016).

[5] Anwar, M., Nugrohadi, S., Tantriyati, V. and Windarni, V. Rain prediction using rule-based machine learning approach. Adv. Sustain. Sci. Eng. Technol. 2 (2020).

[6] Bagirov, A. M., Mahmood, A. and Barton, A. Prediction of monthly rainfall in Victoria, Australia: Clusterwise linear regression approach. Atmos. Res. (2017), 20–29.

56 [7] Wunderground. Kota Kinabalu, Malaysia weather conditions / weather wunderground.

https://www.wunderground.com/weather/my/kota-kinabalu (accessed Jun. 28, 2021).

[8] Zhao, J. and Liu, X.. A hybrid method of dynamic cooling and heating load forecasting for office buildings based on artificial intelligence and regression analysis. Energy Build. 174 (2018) 293–308.

[9] Jakaria,A., Hossain, M. M. and Rahman, M. Smart weather forecasting using machine learning: A case study in Tennessee. ArXiv, 1 (2020).

[10] Granderson, J., Touzani, S., Custodio, C., Sohn, M., Fernandes, S., and Jump., D. Assessment of automated measurement and verification (M\&V) Methods. ( 2015).

[11] Neill, S.P. and Hashemi, M. R.. Chapter 8 - Ocean modelling for resource characterization in Fundamentals of Ocean Renewable Energy, S. P. Neill and M. R. Hashemi, Eds. Academic Press (2018) 193–235.

[12] Chai, T. and Draxler, R.R. Root mean square error (RMSE) or mean absolute error (MAE)? – Arguments against avoiding RMSE in the literature. Geosci. Model Dev. 7(3) (2014), 1247–1250.

57 001-014

PRELIMINARY STUDY OF REDUCED GRAPHENE OXIDE-TITANIUM

Dalam dokumen ppst stem seminar 2021 - OER@UMS Home (Halaman 53-60)