Abstract
Indian summer monsoon is a complex climatic phenomenon. The Indian subcontinent receives 70% of its rainfall from the Indian summer monsoon (ISM). India is an agricultural country. The Indian monsoon is critical for the economy of the country. Prediction of the summer monsoon rainfall plays a significant role in planning of agriculture, irrigation, food storage, and infrastruc- ture development of the country.
Prediction of summer monsoon rainfall still remains a challenging task. It is a non-linear phe- nomenon affected by many global long range effects. The dynamics of tropical climate in the climate change era makes it even more difficult. There are two major approaches for prediction–
the first one is based on numerical physics based global circulation models and the second one is based on statistical predictions using climatic monsoon predictors of the world. Performances of both the models are currently comparable for Indian monsoon.
In this thesis we explore machine learning based statistical prediction models– we attempt the tasks of forecasting the national aggregate rainfalls in the months of June-September. We also predict the early and late monsoons spanning the months of June-July and August-September, re- spectively. Regional rainfalls in the central, north-east, north-west, and south-peninsular parts of the country are also predicted. Finally, we try to predict the active and break spells of monsoon over the summer months.
Machine learning based monsoon prediction consists of two major steps, namely, identification of monsoon predictors, and development of prediction algorithms using these predictors. The thesis addresses both these issues. While identifying the climatic predictors and developing the prediction algorithms, effort is being made to address the challenges specific to monsoon pre- diction. The challenges include variable and time evolving nature of interaction between global climatic variables and Indian monsoon, non-linear and complex nature of the phenomenon, pres- ence of teleconnection and long-range dependencies, and finally, uncertainties involved with the phenomenon.
The major contributions of the thesis are the following. A recurrent neural network based ap- proach is first proposed for prediction of national aggregate rainfall. The three operational models of Indian Meteorological Department have mean absolute errors of 7.5%, 7.1%, and 6.5%, respec-
tively. While the recurrent network has an error of 3.3% in predicting the summer monsoon.
Next, to model the time evolving characteristics of influence of global climatic variables over monsoon, we adopt an ensemble approach. The years that have similar nature are grouped to- gether using fuzzy and subspace clustering. Models built on these groups of years are ensembled to obtain the prediction. The approach reports an error of 3.4% on the national aggregate monsoon prediction.
In order to capture long-range dependencies for identification of new climatic predictors, we next propose a climate network based approach. A complex network is built using geographical grids as nodes and similarity between their climatic variables as edge weights. A community detection is performed on this network for extraction of new predictors influencing the Indian monsoon.
The predictors are observed to provide an mean absolute error of 4.3% using ensemble of bagged regression trees model for forecasting the summer monsoon. The discovered predictors are found to be meteorologically significant.
The problem of identification of monsoon predictor is then addressed with an autoencoder frame- work. This helps in capturing the nonlinear relations in the variables. A mean absolute error of 4.4% is reported for monsoon prediction by using the identified predictors using an ensemble of bagged regression trees prediction model. Many of the existing monsoon predictors are recaptured by this method as well as some new predictors are also identified.
A stacked autoencoder architecture is utilized next for predictor discovery and subsequent pre- diction for more challenging tasks of early-late and regional monsoon rainfalls. Mean absolute errors of 6.1% and 4.9% are observed for early-late, and errors of 4.1%, 5.1%, 5.5%, and 6.4% are attained for four regional monsoon predictions, respectively. These predictions are more useful in planning purposes and deploying proper agricultural policies.
Finally, the problem of prediction of active and break spells of monsoon is focused. These pre- dictions are important for identifying extreme monsoon conditions. A linear discriminant analysis based classifier is used to classify daily monsoon and assemble the classified days to finally detect spells of monsoon. An accuracy of 81% and 57% is achieved in classifying the dry and wet days of monsoon.
Prediction of Indian monsoon still remains a challenging task, where use of sophisticated machine learning as well as hybrid physics-statistical models are likely to grow further in future.