Crop yield prediction and Impact of Weather on Crops in Bangladesh

As a first step, I would like to express my deepest gratitude and thanks to the almighty God for giving me the opportunity to complete my final year thesis successfully. It is my privilege and honor to acknowledge and express our deep gratitude to Md Shohel Arman, Assistant Professor, Department of Software Engineering, Daffodil International University, Dhaka. To complete this project, our supervisor has deep knowledge and a great interest in the field of "Machine Learning".

Imran Mahmud, Head of the Faculty of Software Engineering, deserves my heartfelt thanks for their kind help in completing my thesis. Last but not least, I would like to thank our parents for their support and patience, who have always been there for us. After independence, agriculture was Bangladesh's real monetary engine. As Bangladesh is an agricultural country, the economy as well as food security of this country mostly depends on the production level of various crops throughout the year. Crop yields are basically dependent on the climate. The weather has a significant influence on crop production. AI can blow up the agricultural field by changing the wage situation by developing the ideal harvest. This paper is about predicting harvest yield by applying different machine learning strategies. The prediction from AI Calculations will predict the government and farmers how much crop yield next year by considering factors such as temperature, area, rainfall, humidity and so on.

The most popular features, based on my data, are area, rainfall, humidity, temperature, and the algorithm used is Arima, Long-Short-Term Memory (LSTM), and Exponential Smoothing for time series forecasting. Also classifier model is used for forecast production based on area, rainfall, humidity and temperature where linear model is used and these are Linear Regression, Lasso Regression, Random Forest Regression and Decision Tree.

INTRODUCTION

BACKGROUND
MOTIVATION OF THE RESEARCH
PROBLEM STATEMENT
RESEARCH QUESTIONS …
RESEARCH OBJECTIVE
RESEARCH SCOPE
PREVIOUS LITERATURE
CONCLUSION

Weather conditions are the state of the environment over a short time frame, and environment is the way the climate "behaves" over nearly significant periods of time. The main essentials of the proposed work to contrast the adaptation of climate and the creation level of returns, and get the right result by using different strategies. The predetermined model is produced using a few elements, and in that capacity the boundaries of the models are resolved to use verifiable information during the preparation stage.

For the testing phase, some of the verifiable information that has not been used in preparation is used for performance evaluation purposes. Nevertheless, the total home grown production and per capita availability of food grains has grown in the country over the past several years. At the time when the producers of the yield know the exact data about the harvest yield, it limits the accident.

The main idea is to build the throughput of the agricultural area with AI models. In any case, their work neglects to perform any calculations, moreover, as a result, it cannot give a reasonable understanding of the common sense of the proposed work.

RESEARCH METHODOLOGY

RESEARCH METHODOLOGY
DATA COLLECTION
DATA PREPROCESSING
Time Series Forecasting

ARIMA Model
ADF Tests
Differencing
ACF
PACF
LSTM
Decompose
Exponential Smoothing…

Machine Learning Prediction
ML Model Selection
Predictive Model

Mohsen Shahhosseini1, Guiping Hu1*, Isaiah Huber2 & SotiriosV.Archontoulis2 [6] They showed improvements in yield prediction accuracy in all designed ML models when additional inputs from the simulation crop system model were added. One could conclude that the inclusion of additional variables related to soil water. For time series forecasting, I split the data set and made 5 data sets which included date and each function ie area, precipitation, humidity, temperature and production. Time series analyzers record data points over a period of time at regular intervals rather than randomly or irregularly.

ARIMA models are real models used to further split and sort time series information. When a characteristic is measured regularly, such as daily, monthly, or annually, time series data is created. Similar to the ACF, the partial autocorrelation function only identifies the correlation between two data sets that the shorter lags between those observations do not take into account.

Both regression models with time series data and Auto-Regressive Integrated Moving Average (ARIMA) models can be specified using partial autocorrelation graphs. Recurrent neural networks often have "short-term memory" in that they continuously use past data to inform the continuing neural network. It reduces the gradient slope problem that occurs when a neural network stops learning because the updates to the different loads inside a particular neural network become progressively less extensive.

Time series decomposition involves thinking of a series as a combination of level, trend, seasonality and noise components. Decomposition provides a useful abstract model for thinking about time series in general and for better understanding problems during time series. The exponential smoothing function is a common approach for smoothing time series data known as exponential smoothing.

It is a well-informed and easy-to-use strategy to provide some certainty in light of the client's prior assumptions, such as seasonality. The raw sequence of data is often addressed by ${x_{t}}$, starting at time $t=0$, and the result of the exponential smoothing calculation is typically constructed as ${s_{t}}$, which can be viewed as the best indicator of what the next value of $x$ will be. Specifically, it adds a penalty term to the linear regression loss function, which can reduce coefficients to zero (L1 regularization).

Therefore, the random forest addresses both the bias and error variance components and is proven to be robust. The highest decision node in a tree corresponding to the best predictor is called the root node.

Figure 1: Climate characteristics collect from Climate Information System

RESULTS AND DISCUSSION

INTRODUCTION
RESULT DISCUSSION

Time Series model Forecasting
ML Model Forecasting

CONCLUSION
LIMITATIONS

It can be seen very well from Table III and Fig 1 that the prediction given by ARIMA is better closer to the true information as it has a low average error than LSTM and Outstanding Smoothing. Arima Model: Model is created Temperature in BD and what's more, time series calculations are prepared by examining 50 years of data. It can be seen very well from Table III and Fig 1 that the prediction given by ARIMA is better closer to the true information as it has a low average error than LSTM and Exponential Smoothing.

Arima model: Model is created wind speed in BD and what is more, time series calculations. Arima Model: Model is created humidity in BD and what's more, time series calculations are prepared by examining 50 years of data. Arima Model: Model is created with rainfall in BD and what's more, time series calculations are prepared by examining 50 years of data.

From Table iv and Figure 1, it is very clear that the prediction given by ARIMA is closer to the real information as it has a low average outright error than LSTM and Outstanding Smoothing. From Table v and Figure 1 it is very clear that the prediction given by ARIMA is closer to the real information because it has a low average outright error than LSTM and Outstanding Smoothing. Crop Production Prediction for ML Model: The creation of crops in 50 years in Bangladesh was chosen and their creation quantity was anticipated based on region, temperature, wind speed, precipitation and creation.

Different types were prepared on the data set to display the display of a model for which the true properties were known. It tends to be seen from fig 2 that; Random Forest Classifier gives better results and in this way exceeded the wide range of various methods. The paper introduced the various AI machine learning algorithms to predict the yield of the crop on the assumption of average temperature, wind speed and area.

Tests were conducted on the Bangladeshi government dataset and it was shown that Arbitrary Timberland Regressor gives the best return expectation accuracy. To find the model that performs best, models with more and less highlights should be tried. A large portion of the exams used a variety of AI models to test which model had the best prediction.

It may be due to the angle of weather data that are based on the entire Bangladesh. Next, I will try to get the district data at a perfect angle to get higher prediction, I will try to get the data flow of each county separately and use other models to compare the model of time series.

Figure 2: Anticipated creation with real qualities

The most commonly used models are irregular forests, brain organizations, planar recurrence, and slope-supporting trees.