This thesis entitled "Machine Learning Approach to Predict Rainfall Amount of Dhaka" submitted by Abdul Jabbar Gazi, ID to the Department of Software Engineering, Daffodil International University has been accepted as satisfactory towards the partial fulfillment of the requirements for the degree of B. Sc. I hereby declare that I have taken this thesis under the guidance of Asif Khan Shakir, Associate Professor, Department of Software Engineering, Daffodil International University. I also acknowledge that neither this thesis nor any part of it has been submitted elsewhere for the award of any degree previously by others.
Specially indebted to Daffodil International University for constant guidance and supervision from my respected teacher Asif khan Shakir. And it is very important to predict the amount of rainfall in a country like Bangladesh where 50% of the people are farmers. I have used various machine learning algorithms to predict the future rainfall of Dhaka such as simple linear regression, multivariate linear regression, polynomial regression.
Since precipitation depends on several weather attributes, we used different attributes in linear regression and polynomial regression to determine which attribute gives the best result. And for multivariate linear regression, we used all possible attributes that are related to rainfall.
- Introduction
- Research Questions
- Research Objectives
- Organization of the Thesis
Rainfall is one of nature's best gifts and many countries depend on it for agriculture. Accurate rainfall forecasting is important in daily life. It has a major impact on agriculture as well as natural disaster management. Cultivation in Bangladesh is largely dependent on rainfall and it is important to predict whether it will rain or not.
Analysis of historical precipitation data and its relationship to various atmospheric and oceanic variables is used in an empirical approach. The mostly used empirical approaches used for forecasting are regression, artificial neural networks and fuzzy logic, and ensemble method of data processing. In this paper, we aimed to build a model that can predict the average rainfall in Dhaka.
The data used in this paper contains daily maximum temperature, minimum temperature, relative humidity, wind speed, cloud cover, and sunshine from 1953 to 2013. We calculated the sum of the values of all attributes for a month and then divide them by month to get that particular month's weather information.
Related Works
Summary of Mostly Related Work
This article focuses on finding the best regression model for predicting rainfall and the right attribute to predict rainfall using simple linear regression and polynomial regression in Machine Learning algorithms.
8
- Methodology model
- Dataset
- Pre-processing
- Data visualization and statistics
- Dataset statistical overview
- Estimator selection
This dataset contains daily maximum temperature, daily minimum temperature, daily relative humidity, daily cloud cover, daily wind speed and daily bright sunshine. The dataset contains data from numerous weather stations of Bangladesh including Barishal, Bhola, Bogra, Chandpur, Chittagong, Comilla, Dhaka, Dinajpur and Faridpur. For monthly use, we first added the data for all days and then divided it based on the month in question.
Here we are showing the relationship between Precipitation with maximum temperature, minimum temperature, relative humidity, wind speed, cloud cover and sunshine respectively. 15 ©Daffodil International University This estimator we used to train our data and predict future rainfall amounts. Linear regression is a tool for determining the relationship between dependent variables and non-dependent variables.
For a simple linear regression, assume that x is the predictor and independent variable and y is the dependent variable. Polynomial regression is a type of regression analysis in which the relationship between the independent variable x and the dependent variable y is at the nth polynomial level in x. Polynomial iteration matches the non-linear relationship between the value of x and the corresponding conditional mean of y[1].
Multivariate Linear Regression works like Simple Linear Regression except it has more than one independent variable. To build the multiple linear regression equation, the parameter is taken from the training data and variables are extracted from the data set using correlation. 16 ©Daffodil International University The linear correlation coefficient measures the strength of the relationship between two variables.
The coefficient of determination measures how well the regression line represents data if the regression line passes through every point on the scatter plot it will be ready to explain all the variation.
17
- Result analysis
- Performance Table
- Multivariate Linear regression
- Findings and Contribution
- Future works
From the figure, it is clear that polynomial regression performs better if we use maximum temperature as an independent variable. 22 ©Daffodil International University In figure 4.1.2 we used Minimum Temperature as an independent variable and it is clear that the polynomial model gives a better result than simple linear regression. In Figure 4.1.3 we used relative humidity as an independent variable and we can observe this that the polynomial model performs better than simple linear regression.
In figure 4.1.4 we used Cloud Cover as an independent variable as a predictor and from the figure it is not clear which model performs better. In figure 4.1.5 we used the Wind Speed as an Independent Variable of observation we can see, the polynomial model performs better than simple linear regression. In figure 4.1.6 we have used the Bright Sunshine as an independent variable and it is not clear from the figure which model performs better.
24 ©Daffodil International University. Now, in the analysis of the results, we used four raters to determine the score of the rater. The mean squared error is the average of the squared difference between the estimated value and the predicted value. The mean absolute error measures the total value of the errors in a given set of forecasts without considering their direction.
R-Squared is used to find how well the regression model fits the observed data. From the data in all the above tables, we observed that the polynomial regression worked well with most of the independent variables. Simple linear regression gave good results using only Bright Sunshine as an independent variable.
Because Mean Squared Error, Root Mean Squared Error and Mean Absolute Error are: the lower value means better performance and a higher R-squared value means better performance for the model. We can see that cloud coverage as an independent variable, polynomial regression outperforms the simple linear model. We conducted this experiment to find out the accuracy of precipitation forecast using multivariate linear regression.
My objective of this study was to create the best machine learning model to predict rainfall in Dhaka and to find the best weather attribute with which rainfall is well correlated and which gives better rainfall forecast. Multivariate Linear Regression Gives the best result among the three regressions I have discussed in this paper.