Deforestation Probability Assessment Using Machine Learning Algorithms in the Eastern Himalayan Foothills of India

(1)

14 (2022) 200077

Available online 31 March 2022

Deforestation probability assessment using integrated machine learning algorithms of Eastern Himalayan foothills (India)

Soumik Saha

^a^,^ϯ

, Sumana Bhattacharjee

^b

, Pravat Kumar Shit

^c

, Nairita Sengupta

^d

, Biswajit Bera

^a^,^*^,^ϯ

aDepartment of Geography, Sidho-Kanho-Birsha University, Ranchi Road, P.O. Purulia Sainik School, Purulia, 723104, India

bDepartment of Geography, Jogesh Chandra Chaudhuri College (University of Calcutta), 30, Prince Anwar Shah Road, Kolkata 700 033, India

cPG Department of Geography, Raja Narendralal Khan Women’s College (Autonomous), Vidyasagar University, Midnapore, 721102, India

dDepartment of Geography, Diamond Harbour Women’s University, Sarisha, 743368, India.

A R T I C L E I N F O Keywords:

Jaldapara nationalpPark Deforestation probability Machine learning algorithms AUC value

Support vector machine (SVM)

A B S T R A C T

The significant biodiversity rich Jaldapara National Park is situated at Terai-Dooars region of Eastern Himalayan foothill. This study attempts to identify the deforestation probable zones at Jaldapara national park and its surroundings applying five different machine learning algorithms (SVM, NB, RF, DT and ANN). Results show that the northern and middle sections are being faced by high rate of deforestation due to large scale human encroachment, poaching and timber trafficking. Result also illustrates that support vector machine (SVM) brings more accuracy compared with other models. These deforestation probable models are validated through receiver operation characteristics, efficiency, sensitivity and specificity measurement. Area under curve (AUC) value of these models is 0.907, 0.885, 0.825, 0.846 and 0.876 respectively. The novelty of this research is that previously, such machine learning methods (with high precision) have not applied to examine the deforestation probability in this region of Himalayan foothill.

1. Introduction

Expansion of different economic projects along with decline of forest resources is primary concerned of the researcher as well as the environmentalists from the last few decades. Forest plays an important role such as carbon storage, biodiversity conservation, eco-system services, soil formation and conservation, air purification, water cycle continua- tion and oxygen production (Gibson et al., 2011). In a simplistic way we can define the deforestation as the removal of vegetation cover from the land surface. The American Forest Society defines deforestation as a process of removing the trees due to the effect of agricultural activities, climatic condition, grazing, disease, forest fire etc. (Yanai et al., 2012).

Food and Agricultural Organisation (FAO) defines deforestation is the qualitative and quantitative reduction and degradation of forest as well as forest health (Deacon, 1994). From various literature review, some leading deforestation related factors have been investigated such as climatic factors (rainfall, solar radiation, temperature condition etc.),

socio-economic factors (settlement, roads, infrastructure, population etc.), bio-physical factors (soil, geology, geomorphology etc.), different biotic as well as abiotic disturbances (air pollution, soil pollution, pests etc.) etc. (Bax and Francesconi, 2018).

Presently, high rate of deforestation has a great concern among the researchers and environmentalists because earth was covered by 60 million square km of forests before human civilization but after human civilization less than 40 million square km of forest lands has been existed (FAO, 2015). Globally, between 2000 and 2012, around 2.3 million km²forests were eliminated or cut down with a rate of 2 ×10⁵ km²/year. The south-east Asian countries had faced severe forest loses problem due to large scale timbering, unscientific developmental plan and agricultural expansion in between 2000 and 2005 (Stibig et al., 2014). Presently, Asia Pacific region has faced a slide declination in the rate of forest loss since last five years compared with early 1990s (GFRA, 2015). National and international investments over forest regions due to the increasing demand of forest resources is the central

* Corresponding author at: Biswajit Bera, Department of Geography, Sidho-Kanho-Birsha University, Ranchi Road, P.O. Purulia Sainik School, Purulia, 723104, India.

E-mail addresses: [email protected] (S. Saha), [email protected] (S. Bhattacharjee), [email protected] (P.K. Shit), [email protected] (N. Sengupta), [email protected] (B. Bera).

ϯ These authors contributed equally to this study.

Contents lists available at ScienceDirect

Resources, Conservation & Recycling Advances

journal homepage: www.sciencedirect.com/journal/

Resources-Conservation-and-Recycling-Advances

https://doi.org/10.1016/j.rcradv.2022.200077

(2)

mechanism of forest loss particularly in Latin America, Sub Saharan Africa, South-East Asia (Dell’Angelo et al., 2017; De Schutter, 2011;

Chamling and Bera, 2020b). In case of south Asian countries and the islands, forest cover is being reduced tremendously due to large scale plantation farming. Indonesia and Malaysia also contributed around 53% and 34% global palm oil production respectively in 2013.

Indonesia elapsed Brazil in respect to forest clearance for logs of natural forests from 2000 to 2012 (Margono et al., 2014). The contemporary scientific studies focused that the forests of tropical biome have been played an important role in carbon sink but in the recent years; large scale deforestation diminishes the carbon store capacity (Hansen et al., 2013). Since the last few decades, large scale land acquisition has been occurred in the global south or the developing countries by the foreign or domestic investors who have a goal to achieve more forest resources and agricultural commodities and the government of these countries always accepts the investors due to their potentiality to increase foreign technology, capital and promote job creation facility and development activities (Chung, 2019). Protection of the global forest region is crucial for climate change mitigation, local livelihood protection, biodiversity conservation etc. but the world forests are now embedded with a complicated network system by the international actors and policy makers for the commercial trade (Verburg et al., 2013; Liu et al., 2015;

Chamling and Bera, 2020a).

Illegal human intervention within forest pockets, poaching, cultiva- tion at the vicinity of forest and extension of tea plantation are highly responsible for deforestation, biodiversity loss and fragmentation of

forest habitat in tropical and sub-tropical moist or dry deciduous forest region of southern Asia (Bera et al., 2021a). The forest cover areas of Himalayan foothill zone are being gradually deteriorated with time due to huge expansion of agricultural activities, timber trafficking and infrastructural development (Bera et al., 2021b). Tropical deforestation is a significant factor for global climate change and makes a great concern among the environmentalist. Tropical deforestation is also associated with regional hydrological input modification, regional climatic system, global bio-chemical cycle and biodiversity loss (Puig, 2000; Fontan, 1994). The northern part of West Bengal was featured with dense forest cover. Now, this region has been associated with several national parks and sanctuaries such as Buxa Tiger Reserve, Jal- dapara, Gorumara, Neora Vally, Chapramari, Jorepokhri, Mahananda (West Bengal Forest Department, 2016), that are connected with biodiversity conservation and sustainable forest resource management.

However, forests of this region have faced an alarming condition due to human expansion, intensification of agricultural land, infiltration of human activities within forest region as well as ecology (Dey, 1991).

Different machine learning algorithms regarding the prediction of deforestation probability assists the researchers and policy makers to take proper plans over the high deforested probable areas. Currently, very high accuracy added remote sensing satellite data along with statistical and high accuracy added machine learning models have been widely used all over the world to generate accurate deforestation probable zones. Over the years various kinds of techniques have been used for deforestation probability assessment (statistical approaches, Fig. 1.Geographical location of the study area

(3)

spatial approaches and machine learning approaches) (Mayfield, 2015).

Common parametric models are extensively used in various studies but machine learning models are able to generalize a huge set of data with precise representation (Dlamini, 2016). Machine learning approaches can provide an influential and efficient way to deal with large number of data that are mainly non-linear, high dimensionality and its complicated interaction with missing value (Bhattacharya, 2013). Machine learning approaches can significantly improve the accuracy of the model and these kinds of models are also used in different hazard assessment studies like deforestation as well as forest management (Rogan et al., 2008). Machine learning models significantly provide various advan- tages over the traditional statistical methods (Liu et al., 2018).

Presently, machine learning becomes a popular branch of artificial intelligence and it is also frequently used in hazard prediction studies.

The main mechanism of machine learning is to express the relationship between target variable and the predictors using the computer algorithms from training dataset (Chen et al., 2017). Since 1990s, machine learning approaches are being extensively used for environmental studies (Hsieh, 2009). Now, these ML methods have become popular in forest ecosystem and degradation researches (Bhattacharya, 2013).

Random forest (RF) is a powerful and widely used machine learning implication that can predict the target variables with a high accuracy rate (Devasena, 2014). Support Vector Machine (SVM) is another important machine learning algorithm which becomes accepted and useful with the development of artificial intelligence and RS-GIS techniques (Huang and Zhao, 2018). Artificial neural network is (ANN) also widely used in medicine and molecular biology but it had been largely used in ecology and environmental sciences at the beginning of 1990s.

Different soft computing models such as artificial neural networks, neuro-fuzzy logic, decision trees, and support vector machines (SVM), maximum entropy model have been widely used by the researchers all over the world to compute and predict different physical phenomena such as landslide susceptibility, forest fire susceptibility, deforestation susceptibility, groundwater potentiality etc. (Xu et al., 2012; Pradhan, 2013; Wu et al., 2014; Saha et al., 2020; Bera et al., 2020a). After the rapid development of Artificial Intelligence, the application of machine learning in the context of remote sensing studies has become very much popular (Mountrakis, 2011). The algorithms such as support vector machine, decision tree, random forest, artificial neural network have been applied in land cover classification and changing pattern, prediction of forest biomass, analysis of the deforestation susceptibility (Gri- nand et al., 2013; Dlamini, 2016). Machine learning algorithms inevitably require significant amount of data for training the model (Mountrakis, 2011). The capability of ML models has been depended on the rigorous use of training and testing dataset. However, lack of sufficient dataset is a major bottleneck that prevents the widespread application of machine learning models particularly in the context of forest research and forest ecosystem analysis studies (Liu et al., 2018). Remote sensing data along with Geographical Information System (GIS) is a vast discipline for predicting and solving many major earth physical issues with a high accuracy rate in a very shorter time period and it will be better when it is coupled with other highly accepted statistical, machine learning models (Saha et al., 2020). The main objective of the study is to detect the proper deforested probable zones using various machine learning algorithms at the famous wildlife sanctuary Jaldapara (Eastern India) and its surrounding areas of Himalayan foothill.

2. Study area

Jaldapara national park is situated in Terai-Dooars region at the Eastern Himalayan foot hill region of West Bengal with an extension of 216.51 km²area. The whole Jaldapara national park and its surrounding areas are covered by riverine tropical forest and it was declared as a sanctuary in 1941 with its great variety of floral and faunal communal diversity. Jaldapara is mainly famous for conservation of Indian one horn rhinoceros. The government of India was declared Jaldapara as a

National Park in 2012 by combining the sanctuaries which are the home of various species such as leopard, elephant, Indian gaur, different type of birds, snakes etc. (Ghosh et al., 2013). Geographical extension of the total Jaldapara region is from 26^◦31^′to 26^◦45^′N and 89^◦14^′to 89^◦24^′E (Fig. 1) and Jaldapara National Park is restricted under the recently delimited Alipurduar district of northern West Bengal. The whole Jal- dapara range is divided into two different parts such as wildlife sanctuary and reserve forest (Deb et al., 2018). River Torsa divides the whole sanctuary region into two parts i.e., the eastern part known as Chilapata forest (Bhattacharyya and Padhy, 2013) and the western part known as Jaldapara. Previously, these two forests were connected with each other but now whole forest region becomes disconnected by severe deforestation corridors since the colonial period. Presently, in the context of administrative purposes the whole Jaldapara forest region has been classified into total 100, 30 and 9 forest compartments, beats and ranges respectively. This entire forest region has two main perennial rivers which are Malangi and Torsa. Malangi is mainly rain fed river whereas Torsa is a glacial fed river. Bhabar, Terai and Alluvial formation are the noticeable geological formation within the area (Shukla et al., 2017).

The dominating floral community has been classified into 36 species and 25 genera along with 114 various tree species and 75 different planted species in various parts of the forest region (Table 1). Not only the huge natural diversity but also the region is dominated by various tribal and ethnic groups like, Garo, Toto, Megh, Chakma, Munda, etc. and a rich traditional culture is a primary resource of this area (Ghosh et al., 2021).

Table 1

Different important existing species variation in the study area (Jaldapara forest and its adjacent region)

Name of the family Scientific name Area specific vernacular term

Actinidiaceae Saurauia napaulensis Saurauia roxburghii Choerospondias axillaris Lannea coromandelica Mangifera indica Mangifera sylvatica Roxb Spondias pinnata

Gagun Gagun Labsi Jia Aam Jangli Aam Amaro Arecaceae

Areca catechu Caryota urens Roystonea regia

Supari Rangbhang Bottle palm Bignoniaceae

Markhamia lutea Stereospermum chelonoides Oroxylum indicum Stereospermum tetragonum

NK Parari Totola Parari Combretaceae

Terminalia bellirica Terminalia elliptica

Bahera Pakasaj Fabaceae

Dalbergia sissoo Roxb Leucaena leucocephala Samanea saman Saraca asoca Vachellia nilotica

Sissoo Sobabul Siris Asok Babul Lauraceae

Litsea monopetala Machilus gamblei King

Kutmero Kawla Meliaceae

Swietenia mahagoni Walsura tubulata Hiern Artocarpus heterophyllus Lam Toona hexandra

Mehagini Phalame Kanthal Toon

Sapindaceae

Lepisanthes rubiginosa Sapindus mukorossi Gaertn Litchi chinensis Sonn

Reetha Ritha Litchu

(4)

Fig. 2. Different thematic layers for deforestation prediction zone analysis, a. settlement density b. distance from settlement c. agricultural density d. distance from road e. LULC.

(5)

3. Material and Methods 3.1. Database

Eleven different parameters (Fig. 2 & 3) have been selected in this

research as a controller of deforestation (distance from river, agricultural density, altitude, settlement density, forest density, distance from settlement, distance from road, slope, aspect, Normalized Difference Vegetation Index (NDVI) and Land Use Land Cover (LULC)). Remote sensing based satellite data, Shuttle Radar Topography Mission (SRTM) Fig. 3. Different thematic layers for deforestation prediction zone analysis, a. elevation b. NDVI c. slope d. aspect e. Distance from river f. forest density

(6)

digital elevation model, global forest watch data have been considered here for preparation of different thematic layers related with deforestation. Satellite image and Shuttle Radar Topography Mission (SRTM) digital elevation model (DEM) have been extracted from USGS website (https://earthexplorer.usgs.gov/). Altitude, aspect, slope, distance from river, layer have been generated from digital elevation model (DEM) where as other cultural layers such as distance from road, settlement density, distance from settlement, LULC, agricultural density have been prepared from the satellite image classification (Table 2). The deforested zones have been culled from the global forest watch website (https ://www.globalforestwatch.org/). Here, total 30 years temporal scale (1990-2020) has been considered for the deforestation zone demarcation. Another important step for deforestation susceptibility analysis is to generate deforestation inventory points. Here, around 250 deforested points have been identified within the study region whereas same number of points has been generated randomly as non-deforested points.

From the point of machine learning study, the deforestation and non- deforestation can be considered as a binary classification study associated with mainly two classes. Deforestation points are considered as ‘1’

whereas non-deforestation points are considered as ‘0’. All points are randomly selected and classified into two groups, training (70% data) and testing (30% data). Training dataset has been used for training the applied model and testing dataset has been used for validate the dataset.

For analysis of the deforestation susceptibility, total 30 years deforestation data have been considered within the study period and deforestation points as well as zones have been pointed out by considering this temporal time span.

3.2. Multi-collinearity test

In this research multi-collinearity test has been applied to avoid the collinearity problems between the conditioning or explanatory factors.

Tolerance value and variance inflation factor (VIF) have been used to

quantify the severity of multi-collinearity (Table 3). Variance inflation factor (VIF) value greater than 10 and the tolerance value less than 0.1 also indicate the multi-collinearity problem (Johnston et al., 2018). The tolerance and VIF values are as follows,

Tolerance=1− R²J (1)

VIF= 1

Tolerance (2)

Where R²Jillustrates coefficient of determination of the regression equation of the vectors.

3.3. Application of different machine learning methods 3.3.1. Support vector machine (SVM)

Support vector machine (SVM) is a widely used machine learning algorithm based on risk minimization principle which was proposed by Vapnik (Vapnik, 1995). This algorithm separates the classes in a surface (optimal hyper-plain) and clearly illustrates the margin among the dataset (Abe, 2010). The given training points are near to the hyper-plain which is called the support vectors and the aim of this hyper-plain is to distinguish the different classes (Pradhan, 2013).

The aim of SVM is to find the n dimensional hyper-plain and differentiate the dataset which is expressed as,

1

2||w||² (3)

Subject to the following constrain isyi((w.xi) +b) ≥1.

Where ||w||indicates the norm of the hyper-plain and b is the scalar base.

The cost function follows the following formula, L=1

2||w||²− ∑ⁿ

i=1

λi(yi((w.xi) +b) − 1) (4)

Where λiindicates the Lagrangian multiplier, w & b indicate the stan- dard procedure. In the non-separable cases constants are modified by slack variablesξ_i. It is following the equation as

yi((w.xi) +b) ≥1− ξ₁ (5)

Support Vector Machine model is one of the popular and widely used supervised machines learning classifier (Dhingra and Kumar, 2019). A recent study also analyses the deforestation zones through land use land cover classification. It has determined that the overall accuracy was 93.74% in the case of SVM model and the kappa-coefficient was 0.92%

(Babu and Sudha, 2018). Another study depicts that SVM model can significantly determine the forest disturbance regions with the help of 238 points in three different datasets i.e. all forest (AUC= 90.14), temperate forest (AUC= 91.9) and tropical dry forest (79.63) (Sol-

´orzano and Gao, 2022).

3.3.2. Random Forest model (RF)

Random forest (RF) is a widely used ensemble-learning method which was proposed by Breiman (Breiman, 2001). The RF algorithm creates many classification trees during the operation period or the training period and it generates the final model by average the value of all classification trees. The main two parameters of RF algorithm is i) the square root of the number of factors and ii) number of trees to run the model. RF algorithms use the technique of boot-strap aggregating. The RF algorithm also uses the Gini index as a separator which follows,

IT(p) =∑^j

i=1

pi

∑

k∕=1

pk=1−

∑^j

i=1

p²_i (6)

Here, T signifies the training dataset and j demarcates the no of classes.

Table 2

Data source and characteristics of the used thematic layers

Thematic map layer Data

format Source of the data Altitude, slope, distance from river, Aspect Raster

grid Digital elevation model from USGS (https://eart hexplorer.usgs.gov/)

Distance from road Polyline Google earth pro

Forest density, LULC, agricultural density, settlement density, distance from settlement, NDVI

Raster

grid Landsat 8 (OLI) image from USGS (https://eart hexplorer.usgs.gov/)

Deforestation area layer Raster

grid Global forest watch website

(https://www.glo balforestwatch.org/)

Table 3

Collinearity statistics of the selected explanatory factors

Variables Tolerance VIF

NDVI 0.555 1.802

Slope 0.911 1.097

Settlement density 0.229 4.366

Forest density 0.602 1.661

Distance from river 0.799 1.252

Distance from settlement 0.492 2.031

Distance from road 0.683 1.465

Agricultural density 0.248 4.038

Altitude 0.588 1.700

LULC 0.65 2.4

Aspect 0.82 3.017

(7)

3.3.3. Artificial neural network (ANN)

Artificial neural network is a statistical or mathematical model based on biological neuron process function that is fundamental for human brain process. This model was proposed by McCulloch and Pitts in 1943.

Artificial neural network model can simulate nonlinear relationship among the variables. Most commonly used type of artificial neural network is multi-layer perception (MLP). MLP is developed by the blending of three different layers i) input layer ii) hidden layer, it may be one or more than one, and iii) output layer. In artificial neural network, a hyperbolic tangent or sigmoid function is widely used for mathematical convenience.

Itfollows,f(x) =tanh(x) =e^x− e⁻^x

e^x− e⁻^x (7)

The Artificial neural network algorithm follows the below mention equation,

net^l_j(t) =∑^ρ

i=0

(yⁱ⁻_t ¹(t)w^j_i(t))

(8)

Here, i refers the iteration, layer represented by l and j represents the neuron.

Here, the δ factor is in the case of j neuron and output layer is follows, δ^l_j(t) =y^l_j(t)[

1− yj(t)] ∑

δ^l_j(t)w^(l+1)_kj (t) (9)

Here, the δ factor is in the case of j neuron and hidden layer is following, w^l_ji(t+1) = w^l_ji(t) +α

[

w^l_ji(t) − w^l_ji(t− 1) ]

+nδ^(l)_j (t)y^(l−_j ¹⁾(t) (10) Where, α and n refer momentum and learning rate respectively.

3.3.4. Naïve Bayes (NB)

Naïve bayes classifier is a collection of algorithm based on Bayes theorem. It is a group of algorithms where all share common principles.

Naïve bayes method is widely used algorithm in machine learning fields due to its simplicity and linear run time and naïve bayes is a simple probabilistic based method that can accurately predict the class mem- bership probabilities (Farid et al., 2014). In this algorithm a covariance matrix is constructed by the mean of each class and then Bayes theorem has been applied for discrimination (Bhargavi and Jyothi, 2009). Naïve Bayes classifier follows,

yNB= argmaxP(yi) yi= [event,non− event]

∏¹⁷

i=1

P (xi

yi

)

(11)

P (xi

yi

)

= 1

̅̅̅̅̅̅̅̅

2πa

√ e− (xi− n)²

2a² (12)

Where, P(yi)indicates the prior probability, P (

xi

yi

)

follows the condi- tional probability, a and n represent SD and mean respectively.

3.3.5. Decision tree (DT)

Decision tree algorithm is another type of widely used algorithm with tree growth and tree pruning steps (Yeon et al., 2010). A decision tree is a machine learning algorithm which divides the data into

different subsets. A decision tree algorithm grows by selected attributes with the smallest entropy. Entropy is calculated by the following equation,

Entropy(n) = − ∑

j

p( cj

⃒⃒N) log2P(

cj

⃒⃒N)

(3)

Where, P(cj⃒

⃒N)represents the frequency of N.

The entropy of the selecting attribute A is given by Entropy∧(N) =∑^k

j=1

|N|_j

|N|∗Entropy( Nj

) (14)

3.4. Validation

Validation of the implemented models is an important step in any type of research. Receiver Operating Characteristic (ROC) curve is a useful tool which can assess the goodness of fit of the implemented models (Fig. 6). Receiver operating characteristic curve is generated by sensitivity in y axis against 1-specificity in x axis (Fig. 6). The Receiver Operating Characteristic (ROC) curve is a well-accepted validation method of various predictive models such as landslide, deforestation, ground water potentiality, forest file susceptibility etc. (Chen et al., 2017; Rahmati et al., 2017; Gigovi´c et al., 2019; Saha et al., 2020). The area under ROC curve (AUC) represents the capability of the model. The value near 1 represents the high validity and high predictive power of the model whereas the value near 0 represents low validity or the low predictive power of the model (Table 4). The value of AUC is categorized into different classes with different accuracy level such as (0.9-1) excellent, (0.8-0.9) very good, (0.7-0.8) good, (0.6-0.7) average and (0.5-0.6) poor. The ROC curve follows the below mentioned equation, SAUC=∑ⁿ

k=1

(XK+1− XK) (

SK+1− SK+1− SK

2 )

(15)

In this equation SAUC signifies the AUC and SK and XKrepresents the sensitivity and 1-specificity respectively. Other implemented statistical indicators are TPR, FPR, TNR, FNR, Efficiency etc.

The TPR and FPR are as follows, TPR= TP

TP+FN (16)

FPR= FP

FP+TN (17)

Where, TP represents true positive, FN stands false negative, FP indicates false positive and TN shows true negative.

Efficiency (E) is another assessment method that has been used to measure the accuracy of the model (Fukuda et al., 2013). These two methods have been calculated using the following equations,

E= TP+TN

TP+TN+FP+FN (18)

The performance of the machine learning models is based on training and testing data which has been evaluated using different error measurement methods, i.e. root mean square error (RMSE), coefficient of determination (R²) and mean absolute error (MAE). These statistical Table 4

Result of ROC & comparison in the different machine learning models

Models AUC Std. Error Asymptotic Sig Lower Bound Upper Bound TPR FPR TNR FNR Efficiency

SVM .907 .025 .000 .858 .956 0.892 .086 .914 0.108 0.814

NB .885 .032 .000 .821 .948 0.854 .241 .759 0.146 0.785

ANN .876 .034 .000 .800 .932 0.823 .103 .897 0.177 0.769

DT .846 .039 .000 .740 .892 0.816 .166 .834 0.184 0.764

RF .825 .041 .000 .705 .866 0.790 .187 .813 0.210 0.735

(8)

indicators compare the outcomes of the applied models. In the case of statistical modelling, difference between observed value and associated computed value is termed as error (Chai & Draxler, 2014). Both statistical indicators have been done using R programming software.

4. Result

4.1. Multi-collinearity analysis

The collinearity test indicates that there is no multi-collinearity Fig. 4. Different deforestation probable zones by different machine learning models a. SVM model b. NB model c. RF model d. DT model and e. ANN model

(9)

problem among the explanatory factors. All parameters have the VIF value less than 10 and the tolerance value is greater than 0.1 which means all variables are independent and ready to use in the implemented predictive models (SVM, NB, RF, DT and ANN) (Table 3).

4.2. Deforestation probability analysis by SVM model

Support vector machine (SVM) model has been applied here to demarcate the proper deforestation probability zone in the Jaldapara forest region. The result of this prediction model has been categorized into five different classes (Fig. 4(a)) such as very low (17%), low (16.90%), moderate (14.62%), high (20.64%) and very high (30.84%) (Fig. 7). This classification is very much useful for both prediction and possibilities of deforestation cases. The raster output of the probability maps has been classified using natural breaks method in ArcGIS environment. The method of natural breaks is a highly used and reliable raster classification method. This method divides the raster data into natural categories that can significantly minimize the variances within the classes and maximizes between the classes (Jenks, 1967). The natural breaks method classified the deforestation probability zones into five deforestation classes based on different threshold values such as very high (0.78-1), high (0.58-.78), moderate (0.40-0.58), low (0.22-0.40) and very low (0-0.22) and this classification method con- tinues to all of the models. High and very high deforestation probability pockets have been identified in the northern part, and the middle part of the study area mainly in the region of Uttar khairabari, Uttar madarihat, Nutanpara, Sidhabari, Suripara, Salkumarhat etc. whereas low and very low deforestation probability areas have been observed in the eastern and south-western part of the study area particularly the localities of Uttar mandabari, Dakshinmandabari, Kumarpara, Lachhmandabri etc.

4.3. Deforestation probability analysis by NB model

Naïve Bayes (NB) classifier has been accurately applied here for demarcation of deforestation probable zones and the output has been classified into five different classes (Fig. 4(b)) such as very low (18.6%), low (15.84%), moderate (19.47%), high (19.53%) and very high (26.53%) using natural breaks classifier (Fig. 7). Naïve Bayes model has been predicted that northern part and middle sections of the Jaldapara forest region have faced high and very high deforestation probability particularly in Uttar khairabari, Uttar madarihat, Nutanpara, Sidhabari etc. whereas the eastern, western and north-western part have faced low and very low deforestation possibility particularly in Uttar mandabari, Dakshinmandabari, Kumarpara, Lachhmandabri etc.

4.4. Deforestation probability analysis by RF model

The deforestation probability using random forest model has been accomplished using the relative weight of mean decrees accuracy and mean decrees accuracy of Gini index of deforestation variables. The result of RF model has been classified into five different classes (Fig. 4 (c)) such as very low (25.70%), low (16.63%), moderate (19.27%), high (17.42%) and very high (20.98%) (Fig. 7) using natural breaks classifier.

RF model predicts that northern part and some middle section of Jal- dapara forest and its adjacent region are facing in high deforestation probability and eastern, western and south-eastern part have been experienced low deforestation probability.

4.5. Deforestation probability analysis by DT model

Decision Tree model has been successfully applied here to demarcate the deforestation probability areas. The outcome of decision tree model has been classified into five different categories (Fig. 4(d)) such as very low (26.30%), low (16.62%), moderate (16.16%), high (15.91%) and very high (25%) using natural breaks classifier. High and very high deforestation probability areas are particularly found in the middle section and northern part of the Jaldapara forest region and the low and very low class are particularly confined in whole eastern part and western part of the study area.

4.6. Deforestation probability analysis by ANN model

The prediction result of artificial neural network model has been classified into five different classes (Fig. 4(e)) such as very low (15.47%), low (17.83%), moderate (21.13%), high (20.99%) and very high (24.58%) using natural breaks classifier. High and very high deforestation probability areas have been observed in the northern, middle and some eastern part of the Jaldapara forest and the surrounding regions particularly in the place of Jaldapara, Nutanpara, Uttar khairbari, Madhya satali, Uttar simlabari etc. whereas the low and very low classes are noticed in the eastern, western and some south- western part of the study area, particularly in Kalaberia, Madhya madarihat, Uttar mandabari, Kumarpara etc.

4.7. Validation Assessment

A single validation method is not sufficient to validate the model properly (Saha et al., 2020). In this research different validation methods have been systematically applied. SVM, NB, RF, DT and ANN have been evaluated by the characteristics of receiver operating characteristics, value of AUC of ROC curve, efficiency (E), TPR and FPR.

These methods signify the prediction capability of the applied machine learning algorithms. The value of AUC of the SVM, NB, RF, DT and ANN is 0.907, 0.885, 0.825, 0.846 and 0.876 respectively. It indicates that the SVM has the high prediction capability in the case of this research (Table 4). The sensitivity (TPR) of SVM, NB, RF, DT and ANN is 0.892, 0.854, 0.790, 0.816 and 0.823 respectively and the specificity (FPR) values of the models are 0.108, 0.146, 0.210, 0.184 and 0.177 respectively. It clearly illustrates the good predictive power of the models (Table 4). The efficiency value also shows the robustness of the model and the values of efficiency are 0.814, 0.785, 0.735, 0.764 and 0.769 respectively. All the validation result portrays that the support vector machine (SVM) provides us better predictive result which is followed by naïve bayes (Table 4). For the assessment of performance analysis of various deforestation models, different error measurement techniques have been applied such as coefficient of determination (R²), mean absolute error (MAE) and root mean square error (RMSE). The important findings of these various error measurement techniques illustrate that in this research support vector machine gives us more satisfied result than the others. The value of R², MAE and RMSE of the training set in the SVM model is 0.894, 0.092 and 0.178 respectively whereas the testing phase gives the result 0.905, 0.079 and 0.159 respectively (Table 5). Wilcoxon Signed Rank Test has been applied here for analysis the significant comparison of the applied deforestation susceptibility models. The result of this non-parametric test (Wilcoxon Signed Rank) identifies the Table 5

Measurement of accuracy of the used machine learning models through various error measurement techniques

Error measures SVM NB RF DT ANN

Training Testing Training Testing Training Testing Training Testing Training Testing

RMSE 0.178 0.159 0.214 0.196 0.278 0.326 0.297 0.341 0.287 0.257

MAE 0.092 0.079 0.124 0.105 0.184 0.214 0.194 0.216 0.176 0.187

R² 0.894 0.905 0.846 0.867 0.716 0.674 0.697 0.629 0.706 0.742

(10)

significant difference between the models based on Z and P value (Table 6). The importance analysis of the controlling factors has also been clearly completed (Fig. 5).

5. Discussion

Deforestation is one of the major concerns in environmental research because it is the main factor of environmental degradation (Kumar et al., 2014). Presently, exact demarcation of deforestation probability zone is an important tool to prevent the deforestation probability. Appropriate determination of deforestation probable zones may help the environmentalists and developers or planners to take proper management programmes so it is a concern topic among the researchers all over the world. Statistical with high probabilistic machine learning techniques have been developed and executed all over the world to develop proper prediction of deforestation (Bera et al., 2020a; Kumar et al., 2014).

Machine learning algorithms presently gain significant attention in the case of different environmental modelling because these models can significantly predict the complex relationship between dependent Table 6

comparison of various machine learning models for deforestation probability assessment with the help of Wilcoxon Signed Rank Test

Model comparison Z^avalue Significance

SVM-NB -14.146^b P<0.05

SVM-RF 12.429^b P<0.05

SVM-DT -12.847^c P<0.05

SVM-ANN -13.548^b P<0.05

NB-RF -14.578^b P<0.05

NB-DT -14.578^c P<0.05

NB-ANN 12.178^b P<0.05

RF-DT -2.547^b P>0.05

RF-ANN 12.698^c P<0.05

DT-ANN -6.478^b P<0.05

Here

arepresents the Wilcoxon Signed Rank Test

b and ^crepresent positive and negative rank respectively

Fig. 5. Analysis the importance of various explanatory factors or predictors in the used machine learning models

(11)

variables and the predictors. Various machine learning prediction based models (such as generalised linear model (GLM), artificial neural network (ANN), Bayesian network (BN) models) are common and generally used for prediction of deforestation probable zones coupled with remote sensing data (Fenton and Neil, 2013; Mayfield et al., 2017).

All five implemented machine learning models indicate that the northern and middle part of Jaldapara forest adjacent areas have been faced by high deforestation probability (Fig. 4). The northern section of the Jaldapara forest and its surrounding regions are situated in the foothills of eastern Himalaya. This is basically piedmont area and altitude has an increasing trend towards the Shivalik Himalayas. The eastern and western sites of the study area have been faced by low deforestation probability due to high forest density, restricted human movements and strict forest rules and regulations. River Torsha divides the whole study area into two parts. Here, most of the models explicitly highlighted that the forests of the study area such as Torsha forest range, Chilpata forest, and Jaldapara national park area have low deforestation probability whereas Salkumar forest area, Dakshinbarajhor forest have high deforestation probability due to significant settlement growth within last decade at the vicinity of the forests. The middle section of the study area has faced various anthropogenic activities along with tribal settlement and these localities are Kalaberia, Suripara, Nutanpara, Salkumarhat etc. Previously, the entire Himalayan foothill belt was covered by dense tropical forest but since the colonial period the land use pattern is being immensely transformed. As per the wildlife conservation strategy 2002, there should be no eco-fragile zone around the national park for the protection of core and buffer regions from the different anthropogenic stresses (Deb et al., 2014). The LULC classification of Jaldapara forest

region indicates that there is no buffer zone around the core forest region which can increase the unwanted anthropogenic interaction between people and forest region and it leads the way of forest degradation (Deb et al., 2018). The LULC changing pattern of this region has been extracted by different temporal satellite images and classification methods from 1978 to 2016 and it has been observed that dense forest region is tremendously changed due to illegal encroachments and infiltration by the tribal and forest fringe dwellers. In 1978 total dense forest region of this area was 7.93% and this forest area was decreased to 5.42% and 5.03% in 2001 and 2016 respectively (Deb et al., 2018).

Anthropogenic intervention is the most important driver of forest conservation in this region. The conversion of forest land into industrial plantation land over last few decades is another important driver for forest degradation. During the British colonial era many tribal people came from Bihar and permanently settled in this region and year after year they are penetrating in this dense forest region. Recently, local and national media have exposed the illegal poaching and timber trafficking activities over this region and such activities lead the rate of forest fragmentation. Most of the areas of Jaldapara forest region is under Alipurduar district according to district census handbook 2011 and this region is featured by large number of tribes (18.89%) and marginal workers (9.27%). Whereas a significant number of people is also engaged in agricultural activities (37.32%) (District census handbook, 2011). In the recent years, many central and state governmental projects have been executed at the closeness and somewhere within the forest regions. Timber trafficker’s racket has penetrated in the dense and patch forest areas and they cut carelessly the series of expensive old trees.

Finally, timber traffickers sell these products in different national and Fig. 6.Receiver Operating Characteristic (ROC) curves for the validation of different machine learning models

(12)

international markets with high price. As a result, patch forest areas have been enlarged drastically years after years (Bera et al., 2020a and 2020b; Chamling et al., 2021).

5.1. Management approach of forest resources

Today, prediction and classification based machine learning and deep learning algorithms (Artificial Intelligence) are being applied in different fields for the instant solution and management of various problems. Here, different machine learning algorithms have been used to identify the deforestation probability zones particularly in the Eastern Himalayan biodiversity hotspot zone. The Eastern Himalayan foothills biodiversity zones provide huge provisional, regulatory, supporting, cultural and spiritual services directly to regional people as well as large number of global people. Recent studies focused that large scale deforestation, wild animal and timber trafficking and forest habitat conversion are significantly increasing within the last three or four decades in different pockets of Eastern Himalayan foothills (Bera et al., 2020a). In this respect, conservation of forest resources along with forest habitat is highly required for the health of the total environment. Thus, relevant holistic forest management techniques should be considered (Fig. 8).

•Different schemes of Joint Forest Management (JFM) should be implemented in different forest pockets of India through proper co- ordination between forest dwellers, forest fringe people and forest officers (Murali et al., 2002; Bera et al., 2021b).

•Large scale use of non-timber forest products (NTFPs) should be extended particularly for the forest dwellers and forest fringe people (Suleiman et al., 2017).

•Capacity building, alternate income generation and enhancement of tribal livelihood are highly essential for the people who reside at the proximity of forest and also within the forest pockets (Bera et al., 2021b).

•Financial support should be provided to tribal and non-tribal people who are residing at the forest zones for the initiation of home stay

tourism, plantation and practice of agriculture (Chamling et al., 2021).

• Forest protection regulations and acts should be strictly executed for timber and wild animal traffickers and poachers (Bera et al., 2020a;

Bera et al., 2021b).

• Award and centre of excellence should be set up for the brilliant research on forest and protection and management of biodiversity (Bera et al., 2021b).

• All festivals and cultural programmes should be celebrated through the new tree plantation system (Masiero et al., 2015; Bera et al., 2021b).

• Environmental education and awareness should be spread among the students and local people.

• Significance of biodiversity and different causes of elimination of biodiversity should be incorporated in the school and college sylla- bus (Bera et al., 2021b).

6. Conclusion

We live in the era of deforestation and land degradation and it is the global concerned among the policy makers, administrators and environmentalists. In this research, it has been observed that the support vector machine (SVM) algorithm model provides more accurate and precession result than the other machine learning models due to its high sensitivity value along with high AUC value (0.90). Due to high population growth along with human-forest conflict is a serious worry in our country and being third world country forest regions are very much witnessed by the anthropogenic stress due to their daily needs of commodities. In the present context, deforestation probability zone analysis along with daily monitoring and proper management strategies can lead the forest sustainability. This study mainly identified the very high deforestation probable zones along with proper reasons, so that government can take different strategies for the management. In this regard, creation of artificial forest buffer zone around the national park can improve health of the Jaldapara national park along with alternate livelihood of forest dwellers and forest fringe people. The Joint Forest Fig. 7. Bar graph shows the area of different classes of different models

(13)

Management (JFM) scheme should be implementing to improve the forest health particularly in different pockets of this study area. The use of non-timber forest products (NTFPs) should be restricted for forest dwellers and forest fringe people in this area. Subsequently, government with forest department should strictly impose the rules and regulations

for wild animal and timber traffickers. Further research is required to enhance the alternate livelihood of the forest dwellers and forest fringe people. More financial supports should be supplied for further research and development particularly to conserve the pristine natural resources.

Community based forest management is an important tool in the present Fig. 8.Flow diagram represents the different direct and indirect methods of forest management and biodiversity conservation

(14)

day context where human and nature interaction occurs regularly (Datta and Deb, 2017; Bera et al., 2020b). The results of these machine learning models will definitely assist to the policymakers for sustainable forest resource management along with wild species and wild habitat conservation.

CRediT authorship contribution statement

Soumik Saha: Conceptualization, Methodology, Formal analysis, Writing – original draft, Writing – review & editing, Visualization.

Sumana Bhattacharjee: Conceptualization, Supervision, Formal analysis, Writing – review & editing. Pravat Kumar Shit: Supervision, Formal analysis, Writing – review & editing. Nairita Sengupta: Formal analysis, Writing – review & editing. Biswajit Bera: Conceptualization, Methodology, Formal analysis, Writing – original draft, Writing – review

& editing, Visualization.

Declaration of interest statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

Abe, S., 2010. Support Vector Machines for Pattern Classification. Springer, New York NY USA.

Babu, J.S., Sudha, D.T., 2018. Analysis and detection of deforestation using novel remote-sensing technologies with satellite images. In: IADS International Conference on Computing. Communications & Data Engineering (CCODE). Available at SSRN.

https://ssrn.com/abstract=3187151.

Bax, V., Francesconi, W., 2018. Environmental predictors of forest change: an analysis of natural predisposition to deforestation in the tropical Andes region. Peru. Appl.

Geogr. 91, 99–110. https://doi.org/10.1016/j.apgeog.2018.01.002.

Bera, B., Bhattacharjee, S., Sengupta, N., Saha, S., 2021b. Dynamics of deforestation and forest degradation hotspots applying geo-spatial techniques, apalchand forest in terai belt of himalayan foothills: conservation priorities of forest ecosystem. Remote Sens. Appl.: Soc. Environ. 22, 100510 https://doi.org/10.1016/j.rsase.2021.100510.

Bera, B., Saha, S., Bhattacharjee, S., 2020a. Forest cover dynamics (1998 to 2019) and prediction of deforestation probability using binary logistic regression (BLR) model of Silabati Watershed, India. Trees Forests People 2, 100034. https://doi.org/

10.1016/j.tfp.2020.100034.

Bera, B., Saha, S., Bhattacharjee, S., 2020b. Estimation of forest canopy cover and forest fragmentation mapping using landsat satellite data of Silabati River Basin (India).

KN. J. Cartogr. Geogr. Inf. https://doi.org/10.1007/s42489-020-00060-1.

Bera, B., Shit, P.K., Sengupta, N., Saha, S., Bhattacharjee, S., 2021a. Susceptibility of deforestation hotspots in Terai-Dooars belt of Himalayan Foothills: a comparative analysis of VIKOR and TOPSIS models. J King Saud Univ - Comput Inf Sci. https://

doi.org/10.1016/j.jksuci.2021.10.005.

Bhargavi, P., Jyothi, S., 2009. Applying naive bayes data mining technique for classification of agricultural land soils. Int. J. Comput. Sci. Netw. Secur. 9, 117–122.

Bhattacharya, M., 2013. Machine learning for bioclimatic modelling. Int. J. Adv.

Comput. Sci. Appl. 4 (2), 1–8. https://doi.org/10.14569/IJACSA.2013.040201.

Bhattacharyya, M.K., Padhy, P.K., 2013. Forest and wildlife scenarios of northern West Bengal, India: a review. Int. Res. J. Biol. Sci. 2, 70–79. http://www.isca.in/IJBS/A rchive/v2/i7/15.ISCA-IRJBS-2013-044.pdf.

Breiman, L., 2001. Random forest. Mach. Learn. 45, 5–32.

Chai, T., Draxler, R., 2014. Root mean square error (RMSE) or mean absolute error (MAE)?. Geosci. Model Dev 7 (1), 1247–1250. https://doi.org/10.5194/gmdd-7- 1525-2014.

Chamling, M., Bera, B., 2020a. Likelihood of elephant death risk applying kernel density estimation model along the railway track within biodiversity hotspot of Bhutan–Bengal Himalayan Foothill. Model. Earth Syst. Environ. 6, 2565–2580.

https://doi.org/10.1007/s40808-020-00849-z.

Chamling, M., Bera, B., 2020b. Spatio-temporal patterns of land use/land cover change in the Bhutan– Bengal foothill region between 1987 and 2019: study towards geospatial applications and policy making. Earth Syst. Environ. 4, 117–130. https://

doi.org/10.1007/s41748-020-00150-0.

Chamling, M., Bera, B., Sarkar, S., 2021. Geospatial environmental modeling of forest declining trend in eastern Himalayan biodiversity hotspot region. Forest Resources Resilience and Conflicts, pp. 417–433. https://doi.org/10.1016/B978-0-12-822931- 6.00030-7.

Chen, W., Xie, X., Wang, J., et al., 2017. A comparative study of logistic model tree, random forest, and classification and regression tree models for spatial prediction of landslide susceptibility. Catena 151, 147–160. https://doi.org/10.1016/j.

catena.2016.11.032.

Chung, Y.B., 2019. The grass beneath: conservation, agro-industrialization, and land–water enclosures in postcolonial Tanzania. Ann. Am. Assoc. Geogr. 109, 1–17.

https://doi.org/10.1080/24694452.2018.1484685.

Datta, D., Deb, S., 2017. Forest structure and soil properties of mangrove ecosystems under different management scenarios: experiences from the intensely humanized landscape of Indian Sundarbans. Ocean Coast Manag. 140, 22–33. https://doi.org/

10.1016/j.ocecoaman.2017.02.022.

De Schutter, O., 2011. Green rush: the global race for farmland and the rights of land users. Harvard Int. Law J. 52, 503–556.

Deacon, R.T., 1994. Deforestation and the rule of law in a cross-section of countries. Land Econ. 70 (4), 414–430. https://doi.org/10.2307/3146638.

Deb, S., Ahmed, A., Datta, D., 2014. An alternative approach for delineating eco sensitive zones around a wildlife sanctuary applying geospatial techniques. Environ. Monit.

Assess. 186, 2641–2651. https://doi.org/10.1007/s10661-013-3567-7.

Deb, S., Debnath, M.K., Chakraborty, S., et al., 2018. Anthropogenic impacts on forest land use and land cover change: modelling future possibilities in the Himalayan Terai. Anthropocene 21, 32–41. https://doi.org/10.1016/j.ancene.2018.01.001.

Dell’Angelo, J., D’Odorico, P., Rulli, M.C., Marchand, P., 2017. The tragedy of the grabbed commons: coercion and dispossession in the global land rush. World Devel 92, 1–12. https://doi.org/10.1016/j.worlddev.2016.11.005.

Devasena, C.L., 2014. Comparative analysis of random forest, REP tree and J48 classifiers for credit risk prediction. Inter. J. Comp. Appl. 30–36.

Dey, S.C., 1991. Depredation by wildlife in the fringe areas of North Bengal forests with special reference to elephant damage. Indian For. 117, 901–908. https://doi.org/

10.36808/if/1991/v117i10/8731.

Dhingra, S., Kumar, D., 2019. A review of remotely sensed satellite image classification.

Int. J. Electr. Comput. Eng 9 (2088-8708). http://doi.org/10.11591/ijece.v9i3.pp 1720-1731.

District Census Handbook Koch Bihar, 2011. Census of India, West Bengal, Series-20 Part XII-B, Village and Town Wise Primary Census Abstract. Directorate of Census Operations, West Bengal.

Dlamini, W.M., 2016. Analysis of deforestation patterns and drivers in Swaziland using efficient Bayesian multivariate classifiers. Model. Earth Syst. Environ. 2 (4), 1–14.

https://doi.org/10.1007/s40808-016-0231-6.

FAO (Food and Agriculture Organisation). 2015. http://faostat.fao.org/. (Access date:

12-4-2020).

Farid, D.M., Zhang, L., Rahman, C.M., Hossain, M.A., Strachan, R., 2014. Hybrid decision tree and Bayes classifiers for multi-class classification tasks. Expert. Syst. Appl. Int. J.

41, 1937–1946. https://doi.org/10.1016/j.eswa.2013.08.089.

Fenton, N., Neil, M., 2013. Risk Assessment and Decision Analysis with Bayesian Networks. CRC Press New York.

Fontan, J., 1994. Changements globaux et de´veloppement. Nat. Sci. Soc. 2 (2), 143–152.

https://doi.org/10.1051/nss/19940202143.

Fukuda, S., Baets, B.D., Waegeman, W., Verwaeren, J., Mouton, A.M., 2013. Habitat prediction and knowledge extraction for spawning European grayling (Thymallus thymallus L.) using a broad range of species distribution models. Environ. Model.

Softw. 47, 1–6. https://doi.org/10.1016/j.envsoft.2013.04.005.

GFRA (Global Forest Resources Assessment). 2015. FAO of UN (Retrieved from).

http://www.fao.org/forest-resources-assessment/documents/en/.

Ghosh, C., Ghatak, S., Biswas, K., Das, A.P., 2021. Status of tree diversity of the Jaldapara National Park in West Bengal, India. Trees,Forests People 3, 100061. https://doi.

org/10.1016/j.tfp.2020.100061.

Ghosh, C., Paul, T.K., Das, A.P., 2013. Rediscovery of Hibiscus fragrans roxburgh (Malvaceae) from Jaldapara National Park in Duars of West Bengal, India. Pleione 7, 531–537.

Gibson, L., Lee, T., Koh, L., et al., 2011. Primary forests are irreplaceable for sustaining tropical biodiversity. Nature 478, 378–381. https://doi.org/10.1038/nature10425.

Gigovi´c, L., Pourghasemi, H.R., Drobnjak, S., Bai, S., 2019. Testing a new ensemble model based on SVM and random forest in forest fire susceptibility assessment and its mapping in Serbia’s Tara National Park. Forests 10 (5), 408. https://doi.org/

10.3390/f10050408.

Grinand, C., Rakotomalala, F., Gond, V., Vaudry, R., Bernoux, M., Vieilledent, G., 2013.

Estimating deforestation in tropical humid and dry forests in Madagascar from 2000 to 2010 using multi-date Landsat satellite images and the random forests classifier.

Remote Sens. Environ. 139, 68–80. https://doi.org/10.1016/j.rse.2013.07.008.

Hansen, M.C., Potapov, P.V., Moore, R., et al., 2013. High-resolution global maps of 21st- century forest cover change. Science 342, 850–853.

Hsieh, W.W., 2009. Machine Learning Methods in the Environmental sciences: Neural Networks and Kernels. Cambridge University Press, Cambridge, UK. https://doi.org/

10.1017/CBO9780511627217.

Huang, Y., Zhao, L., 2018. Review on landslide susceptibility mapping using support vector machines. Catena 165, 520–529. https://doi.org/10.1016/j.

catena.2018.03.003.

Jenks, G., 1967. The data model concept in statistical mapping. Int. Yearb. Cartogr. 7, 186–190.

Johnston, R., Jones, K., Manley, D., 2018. Confounding and collinearity in regression analysis: a cautionary tale and an alternative procedure, illustrated by studies of British voting behaviour. Qual. Quant. 52 (4), 1957–1976. https://doi.org/10.1007/

s11135-017-0584-6.

Kumar, R., Nandy, S., Agarwal, R., Kushwaha, S.P.S., 2014. Forest cover dynamics analysis and prediction modelling using logistic regression model. Ecol. Indic. 45, 444–455. https://doi.org/10.1016/j.ecolind.2014.05.003.

Liu, J., Mooney, J., Hull, V., et al., 2015. Systems integration for global sustainability.

Science 347, 1258832. https://doi.org/10.1126/science.1258832.

Liu, Z., Peng, C., Work, T., Candau, J.N., DesRochers, A., Kneeshaw, D., 2018.

Application of machine-learning methods in forest ecology: recent progress and