Research Paper
Landslide susceptibility mapping using machine learning algorithms and comparison of their performance at Abha Basin, Asir Region, Saudi Arabia
Ahmed Mohamed Youssef
a,b, Hamid Reza Pourghasemi
c,*aGeology Department, Faculty of Science, Sohag University, Egypt
bGeological Hazards Department, Applied Geology Sector, Saudi Geological Survey, P.O. Box 54141, Jeddah, 21514, Saudi Arabia
cDepartment of Natural Resources and Environmental Engineering, College of Agriculture, Shiraz University, Shiraz, Iran
A R T I C L E I N F O Handling Editor: E. Shaji
Keywords:
Landslide susceptibility Machine learning algorithms Variables importance Saudi Arabia
A B S T R A C T
The current study aimed at evaluating the capabilities of seven advanced machine learning techniques (MLTs), including, Support Vector Machine (SVM), Random Forest (RF), Multivariate Adaptive Regression Spline (MARS), Artificial Neural Network (ANN), Quadratic Discriminant Analysis (QDA), Linear Discriminant Analysis (LDA), and Naive Bayes (NB), for landslide susceptibility modeling and comparison of their performances. Coupling machine learning algorithms with spatial data types for landslide susceptibility mapping is a vitally important issue. This study was carried out using GIS and R open source software at Abha Basin, Asir Region, Saudi Arabia.
First, a total of 243 landslide locations were identified at Abha Basin to prepare the landslide inventory map using different data sources. All the landslide areas were randomly separated into two groups with a ratio of 70% for training and 30% for validating purposes. Twelve landslide-variables were generated for landslide susceptibility modeling, which include altitude, lithology, distance to faults, normalized difference vegetation index (NDVI), landuse/landcover (LULC), distance to roads, slope angle, distance to streams, profile curvature, plan curvature, slope length (LS), and slope-aspect. The area under curve (AUC-ROC) approach has been applied to evaluate, validate, and compare the MLTs performance. The results indicated that AUC values for seven MLTs range from 89.0% for QDA to 95.1% for RF. Ourfindings showed that the RF (AUC¼95.1%) and LDA (AUC¼941.7%) have produced the best performances in comparison to other MLTs. The outcome of this study and the landslide susceptibility maps would be useful for environmental protection.
1. Introduction
Landslides are the common mass wasting processes along the mountainous regions of Saudi Arabia, where many cities, highways, and roads are located along the Arabian Shield (southwestern and western parts of the kingdom territory) (Youssef et al., 2012, 2013, 2014a, b;
Youssef and Maerz, 2013; Elkadiri et al., 2014; Maerz et al., 2014).
Different types of landslides (e.g., rock falls, rock and soil sliding, and debrisflows) in Saudi Arabia are usually triggered by intense rainstorms (Youssef et al., 2013, 2016). This problems will be augmented in the future due to urban area and highway expansion along mass wasting prone areas (rugged mountainous and steep slopes). This is especially true in Asir Region, Saudi Arabia, where thousands of people live in this mountainous region and commute along escarpment highways, which consider high-risk threat areas related to landslides. Therefore, it is essential to assess and prevent landslide disasters for this area.
Landslides represent the most damaging natural hazards in the mountainous areas of different parts of the world, causing loss of human life, property damage, and consequently economy crisis. These landslides triggered due to various external factors, including heavy rainfall, earthquakes, volcanoes, and anthropogenic activities (Guzzetti et al., 2007; Lin et al., 2007; Hadi et al., 2018; Roback et al., 2018; Strupler et al., 2018; Gordo et al., 2019; Roccati et al., 2019). To overcome these problems, landslide susceptibility maps can play a crucial role in deter- mining the most vulnerable areas for landslides. These models can be prepared using different landslides-conditioning factors (e.g., lithology, lineaments, geomorphology, soil type and depth, slope angle, slope aspect, curvature, altitude, engineering properties of the lithological material, land use patterns, and drainage networks). Various studies have been carried out on landslide susceptibility assessment using remote sensing and GIS techniques (e.g.,Saha et al., 2005; Pradhan et al., 2010a, 2010b; Pradhan and Youssef, 2010; Bednarik et al., 2012; Mohammady
* Corresponding author.
E-mail addresses:[email protected],[email protected](H.R. Pourghasemi).
Peer-review under responsibility of China University of Geosciences (Beijing).
H O S T E D BY Contents lists available atScienceDirect
Geoscience Frontiers
journal homepage:www.elsevier.com/locate/gsf
https://doi.org/10.1016/j.gsf.2020.05.010
Received 7 December 2019; Received in revised form 9 April 2020; Accepted 20 May 2020 Available online 17 June 2020
1674-9871/©2020 China University of Geosciences (Beijing) and Peking University. Production and hosting by Elsevier B.V. All rights reserved. This is an open access
article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
–
et al., 2012; Devkota et al., 2013; Pourghasemi et al., 2013a, 2013b, 2019; Xu, 2013; Regmi et al., 2014; Pham et al., 2016; Gordo et al., 2019;
Nohani et al., 2019; Roccati et al., 2019; Yan et al., 2019).
Landslide susceptibility is considered to be the spatial distribution of the probability of the occurrence of landslides by various investigators (Guzzetti et al., 2006; van Westen et al., 2006; Constantin et al., 2011).
Using various models in predicting landslide, damages could be decreased to a certain extent (Pradhan, 2010). Landslide inventories, together with data on the variables that control landslide development, can be used as the basis to predict the spatial distribution of future landslides and their characteristics. In the prognostic approaches based on past landslide events, the underlying working hypothesis is that future landslides will have similar distribution patterns to those that occurred in the past (i.e., the past is the key to the future; reversed Uniformitarian- ism). Depending on the information available, susceptibility models can be produced to predict the occurrence of future landslides. Susceptibility models represent the likelihood of a landslide occurring in any specific location in terms of relative probability (high, medium, low). Suscepti- bility maps can be produced for different types of natural hazards through several methods: (a) direct mapping of susceptibility zonation based on expert criteria (Ardau et al., 2007; Razak et al., 2011; Salvatici et al., 2018). These maps are subjective and depend on the expert judgments; (b) the deterministic models are based on stability analyses (mathematical relationships between resisting and driving forces) that take into consideration a number of parameters involved in the hazard processes (e.g., engineering characteristics of the rocks and soils, slope geometry, discontinuity characteristics, and hydrological conditions) (Yilmaz, 2009). These methods can be applied effectively to a single-slope scale (Ayalew and Yamagishi, 2005). However, the main limitations of these models are that simplifying the complexity of the hazard processes, generally static, incorporate geometrical suppositions, and difficult and expensive to acquire geotechnical and hydrological data especially when analyzing large and heterogeneous areas; (c) the heu- ristic approach involves establishing susceptibility classes by judging the relative contribution of a number of variables on landslide formation (e.g.Dai and Lee, 2002; Dahal et al., 2008a,b). The main disadvantage of the heuristic methods is the subjective component of the susceptibility assessments (Dai et al., 2001); (d) the probabilistic methods allow the construction of susceptibility models analyzing the statistical relation- ships between the spatial distribution of known landslides and that of a set of controlling factors. In the literature, various methods have been carried out to assess landslide susceptibility with the aid of GIS. Several researches present bivariate analyses, quantifying the spatial relation- ships between landslide and specific variables governing their distribu- tion (e.g.,Pradhan et al., 2010a,b; Constantin et al., 2011; Jaafari et al., 2014; Regmi et al., 2014; Razandi et al., 2015;Ding et al., 2017; Shirzadi et al., 2017; Sestras et al., 2019). More recent publications present multivariate logistic regression methods (Park et al., 2013; Shahabi et al., 2015; Tien Bui et al., 2016); knowledge-based methods (Althuwaynee et al., 2016; Kumar and Anbalagan, 2016), and multivariate binary lo- gistic regression (Mandal and Mandal, 2018); (e) various machine learning models were employed in landslide susceptibility mapping such as support vector machine-SVM (e.g., Tien Bui et al., 2012a, 2012b;
Pourghasemi et al., 2013; Hong et al., 2015; Colkesen et al., 2016; Lee et al., 2017; Kalantar et al., 2018; Pourghasemi and Rahmati, 2018), fuzzy logic-FL (Kumar and Anbalagan, 2015; Shahabi et al., 2015), artificial neuronal networks-ANNs (e.g.,Pradhan and Lee, 2010; Yilmaz, 2010;
Zare et al., 2013; Elkadiri et al., 2014; Arnone et al., 2016; Gorsevski et al., 2016; Wang et al., 2016; Pham et al., 2017; Aditian et al., 2018;
Pourghasemi and Rahmati, 2018), neuro-fuzzy-NF and adaptive neuro-fuzzy inference system-ANFIS (e.g.,Sezer et al., 2011; Dehnavi et al., 2015; Aghdam et al., 2016;Nasiri Aghdam et al., 2016; Chen et al., 2019), decision tree-DT (e.g.,Kavzoglu et al., 2014, 2015; Wu et al., 2014; Tien Bui et al., 2016), generalized additive model-GAM (e.g., Vorpahl et al., 2012; Goetz et al., 2015; Pourghasemi and Rahmati, 2018), adaBoost-AB (Micheletti et al., 2014), random forest-RF (e.g.,
Paudel and Oguchi, 2014; Trigila et al., 2015; Hong et al., 2016a;
Pourghasemi and Kerle, 2016; Youssef et al., 2016; Chen et al., 2017;
Taalab et al., 2018; Park and Kim, 2019), naïve bayes-NB (e.g.,Tsan- garatos and Ilia, 2016; Pham et al., 2017; Chen et al., 2018a,b), kernel logistic regression-KLR (Tien Bui et al., 2016; Chen et al., 2018b), boosted regression tree-BRT (Dickson and Perry, 2016; Youssef et al., 2016; Park and Kim, 2019), classification and regression tree-CART (Vorpahl et al., 2012; Felicísimo et al., 2013; Youssef et al., 2016; Chen et al., 2017; Pourghasemi and Rahmati, 2018), general linear model-GLM (e.g.,Youssef et al., 2016; Pourghasemi and Rahmati, 2018), multivariate adaptive regression splines-MARS (Vorpahl et al., 2012; Felicísimo et al., 2013; Conoscenti et al., 2015), maximum entropy-MaxEnt (Felicísimo et al., 2013; Park, 2015; Hong et al., 2016b; Kornejady et al., 2017), and quadratic discriminant analysis-QDA (Rossi et al., 2010; Pourghasemi and Rahmati, 2018).
The main advantages of the probabilistic and machine learning methods include their objective statistical basis, reproducibility, ability to quantitatively analyze the contribution of factors to landslide devel- opment, and their potential for continuous updating.
In the current study, seven advanced machine learning techniques including SVM, RF, MARS, ANN, QDA, LDA, and NB, were adapted to construct a landslides susceptibility maps using remote sensing and GIS techniques. These models were utilized for a number of reasons including,five of them have never been explored for landslide modeling in Saudi Arabia, suitable for regional and semi regional-scale applica- tions, and relying mainly on remote sensing datasets rather than exten- sivefield surveys. We believe that the acquired results from our work could provide a significant contribution to the scientific community.
Moreover, these landslide susceptibility maps can identify and delineate landslide-vulnerable areas, so that planners and decision makers can choose favorable locations for future development schemes, such as new urban areas.
2. Study area and geological setting
The current study, Abha Basin, covers an area of ~2717 km2within the Asir Region (Fig. 1). It is located at the southwestern portion of the Kingdom of Saudi Arabia between 180805100N to 182401700 N, and 422104500E to 423901900E. The study area has an elongated shape and dissected by many streams that drain their waters towards the main Abha Wadi which runs in the middle of the area towards northeast direction.
Abha City located in the southern portion of the study area which con- nected with other surrounding cities by a dense road-networks. In the recent years, the KSA witnessed an unprecedented expansion of urban areas and road networks. Subsequently, these urban areas and roads could be hampered by intensified landslides occurrences that put the community, urban areas, and infrastructures at risk. The study area located above the Abha Highland and its elevation ranges from 1,959 to 2,992 m above the mean sea level, some of these areas have steep topographic reliefs. The rainfall in most areas of the Saudi Arabia is scarce, infrequent, and generally occurring from October through April, however the southwestern region of Saudi Arabia (Asir Region), the spatial variation of the precipitation is high due to mountainous regions (Almazroui, 2011; Abu Abdullah et al., 2019). The southwestern region of Saudi Arabia is characterized by precipitation events during the entire year due to the topographically driven convective rain (Abdullah et al., 1998). Recently, unprecedented rainstorms happened all over the KSA.
These rainstorms increased in their intensity and frequencies (Abu Abdullah et al., 2019). In fact, rainstorms are considered as the main triggering mechanism for landslides in the study area. Abha highland receives around 300 mm/year of precipitation (Hasanean and Almazroui, 2015; Abu Abdullah et al., 2019). The high rainfall rates in Abha area are largely related to the presence of tropical air masses that reach the extreme southwestern part of Saudi Arabia and to the high elevations that induce orographic rainfall precipitation (Almazroui, 2011; Hasanean and Almazroui, 2015). Geologically, the study area is part of the Abha
Highland that belonged to Arabian Shield. The Arabian Shield (Red Sea mountains) of the Arabian Peninsula are mainly composed of igneous and metamorphic rocks that were originated ~700–1000 million years ago by the accretion of island arcs and closure of interleaving oceanic arcs (Greenwood, 1982). These rocks were uplifted with the opening of the Red Sea ~30 million years ago (Wolfenden et al., 2004). These basement complex rocks unconformably overlain by Paleozoic, Meso- zoic, and Tertiary successions (Coleman, 1973). The study area is char- acterized by different rock units according to Abha quadrangle (GM-75) 1:250,000-scale (Greenwood, 1985) (Fig. 2). These units were digitized
on ArcGIS 10.2 which include: (1) biotite-quartzite-plagioclase granofels;
(2) gabbro; (3) alluvium and gravel (this unit includes wadi alluvium, dissected terraces, colluvium, and fan deposits); (4) Jeddah and Bahah groups (they are inter-bedded with the Bahah group and bounded on the west by a major fault that separates it from Ablah group). It includes basalt and andesiteflows, pillow lava,flow breccia, and pyroclastic rocks.
Dacitic pyroclastic rocks and volcanoclastic conglomeratic, coarse tofine grained greywacke and phyllite are inter-bedded with the volcanic rocks.
These rocks are regionally metamorphosed to the greenschist facies, and sometimes to higher grades. Isoclinal to open folds are associated with Fig. 1. Location of the study area (located in Asir Region, southwest of the KSA), and landslide training and validation data sets.
the formation of the schistosity and imbricate faulting. Other complex folds were produced during the late deformation and the intrusion of plutonic rocks, (5) Bahah group within the Tayyah bet (it is a major component in the western part of the Tayyah belt). It consists of a fault bounded blocks (Greenwood, 1985) including abundant volcanic grey- wacke, local boulder conglomerate, carbonaceous shale, slate, chert, bedded tuff, and interbeds of volcanicflow rock. These rocks are weakly to moderately cleavaged and highly cleavaged near faults. They are characterized by the presence of one cleavage (schistosity) which has steep dips toward east or west. Greywacke is massive to thinly bedded including some sedimentary structures (grading, cross bedding, and lamina bedding). Massive greywacke forms thick beds from 1 to 3 m and interlayered with fine grained and laminated bedded sections. The greywacke and inter-bedded slate are strongly metamorphosed to greenschist facies. Some intrusive rocks, including granodiorite and granite were encountered in the Tayyah belt. Near the intrusive rock, contact amphibolite grade metamorphic rocks were encountered, (6) biotite monzogranite, (7) basalt and andesite: It is olivine basalt, which forms two volcanic cones and surrounding flows. According to the geomorphological evidence, it is related to Quaternary age, (8) meta- gabbro and gabbro, and (9) Wajid sandstone (remnant of Wajid sand- stone rests unconformable on Proterozoic rocks). It consists of light brown to reddish-brown quartz sandstone with some pebbles of quartz veins and quartzite at the base. In addition, the study area is character- ized by intense structural deformation where different faults, folds, and joints were encountered (Fig. 2).
3. Methodology
The current approach demonstrates the landslide susceptibility mapping using different data sources. The methodology of the study is presented inFig. 3.
3.1. Data collection
Various data sources includefield data, historical data acquired from civil defense authority, national reports, and questionnaires with dwellers were collected to identify the old landslide events and their frequencies. Satellite images including enhanced thematic mapper plus (ETMþ) (acquired in April 2016, spatial-resolution of 15 m (after image fusion of 30 m image composite with panchromatic band 15 m)), high- resolution images (GeoEye images with a spatial-resolution of 0.5 m which acquired in May 2016, and professional Google Earth data with a resolution less than 1 m), and DEM derived from 1:10,000 topographic contours using triangulated irregular network (TIN) data structures (spatial resolution 10 m) were used in this study. Geological map (Abha quadrangle, GM-75) with a scale of 1:250,000 was scanned and con- verted to a digital format. Finally, these data sets were projected ac- cording to UTM-Zone 38, WGS84 Datum.
3.2. Landslide inventory map
To prepare landslide susceptibility mapping, two datasets are essen- tially required. Thefirst dataset represents the landslide inventory map.
The second dataset is related to landslide conditioning factors. Landslide inventory data need to be evaluated against landslide conditioning fac- tors to determine their significance and effect on the occurrence of the landslides because it is typically assumed that landslides will occur under the same conditions as before. Performing a landslide susceptibility model needs a clear understanding of the relationship between the real landslides (landslide inventory dataset) and the landslide conditioning factors (Ercanoglu and Gokceoglu, 2004; van Westen et al., 2006; Petley, 2008). In the current study, an inventory map of existing landslides was prepared by integration of the historical records,field investigations, and data extracted from satellite images interpretation. Historical landslides Fig. 2. Geology and structural setting of the study area.
can be detected from high satellite imageries according to their geomorphological features (e.g., breaks in the vegetated area and bare soil, presence offlow materials along gullies, rims, and streams, different erosional features, flow tracks, depositional fans, circular and planar failures) (Youssef et al., 2009, 2016; Elkadiri et al., 2014). Field surveys were carried out to verify the historical and landslide extracted from high resolution images and to collect fresh/new landslides in the study area.
These landslides have volumes ranging from few cubic meters to about few hundreds cubic meters. All inventory data were assembled to establish a landslide inventory map. In the current study, a total of 243 landslides were used which distributed all over the study area (Fig. 1).
Then, the inventory data must be divided into the training and validating datasets to be used for building the models and verifying processes, respectively (Ohlmacher and Davis, 2003; Chacon et al., 2006). In landslide modeling, there is no specific pre-defined method that exists for categorizing inventory data. It is typically decided based on the acces- sibility and quality of data. According to the literature, the percentages commonly applied to divide the inventory dataset are 70% and 30% for the training and validating datasets, respectively (Nefeslioglu et al., 2008a; Pourghasemi and Rahmati, 2018). In this study, the landslide datasets were randomly divided into two groups, model training group was prepared using 70% of the inventory sites (173 sites) whereas, model validating group consists of 30% of the inventory sites (70 sites) (Fig. 1).
The validation process was executed by comparing the existing landslide
locations (validating group) with the acquired landslide susceptibility map.
3.3. Landslide conditioning factors (LCFs)
To achieve high accuracy of landslide susceptibility model in pre- dicting landslide vulnerable areas, selecting and preparing the LCFs database is vitally important step. The LCFs in the current study were selected based on the information collected from the literatures, related to the study area, andfield investigation (Pourghasemi et al., 2013a, b, c;
Tseng et al., 2015; Youssef, 2015; Keesstra et al., 2016; Tien Bui et al., 2016; Kornejady et al., 2017; Ghorbanzadeh et al., 2019; He et al., 2019;
Sevgen et al., 2019; Xiao et al., 2019).Goetz et al. (2015)indicated that precipitations and earthquakes are external and triggering factors of landslides. However, the previous data on these triggering factors in relation to landslide occurrence are not available in the current area;
therefore, these two triggering factors have not been involved in this study.Tseng et al. (2015) mentioned that the selection of landslide related factors (known as internal factors) depends on the characteristics of the study area, the landslide type, and the scale of the analysis. In the current study, twelve LCFs were chosen including, slope-angle, slope aspect, slope length (LS), distance from roads, lithology, distance from wadis, distance from faults, altitude, normalized difference vegetation index (NDVI), plan curvature, profile curvature, and landuse/landcover.
Fig. 3.Flowchart of the study showing modelling strategy.
These factors will be used in predicting the landslides prone areas in the area under consideration. A database including these 12 LCFs were generated using geographical information system (GIS) for data inter- pretation and analysis. Thematic layers with a 20 m 20 m spatial resolution pixel size were prepared (Fig. 4). All layers have UTM coor- dinate system zone 38 with a Datum of WGS 84. In this work, the LCFs have different types; nominal (categorical) where data do not have a natural order or ranking such as (lithology, slope aspect, and land- use/landcover), and ordinal (continuous data) in which the order matters such as (altitude, slope angle, slope length, NDVI, plan-curvature, pro- file-curvature, distance from faults, distance from roads, and distance from wadis).
3.3.1. Slope angle
The steeper the slope angle of the slope, the larger the number of the landslides. The stability of the slope against failure can be defined by the relationship between shear forces and the resistance to shear (safety factor), where the driving force of mass movement increases with increasing the slope angle (Guillard and Zezere, 2012).Tien Bui et al.
(2017)concluded that slope steepness is related both to the shear stresses acting on the hill slope and to the displacement of the landslide mass.
Magliulo et al. (2008)indicated that slope gradient play a vital role in subsurfaceflow and impacts on concentration of the soil moisture which are directly related to the landslide occurrence. The slope angle map of the current study was derived from the Digital Elevation Model (DEM) with a resolution of 20 m, ranging from 0to 59.8(Fig. 4a).
3.3.2. Slope aspect
Slope-aspect can impact various processes which have direct and in- direct influences on landslides including, wind directions, precipitation patterns, sunlight influence, discontinuities orientations, hydrological processes, evapotranspiration, concentration of the soil moisture, vege- tation, and root development (Neuh€auser et al., 2012; Quan and Lee, 2012; Devkota et al., 2013). Various studies indicated that there is a correlation between slope aspect and other geo-environmental factors (landslide-related factors) (Van Den Eeckhaut et al., 2009; Regmi et al., 2010).Capitani et al. (2013)applied conditional analysis to all of the possible combinations between the slope aspect and landslide-predisposing factors (lithology, slope angle, distance to the hydrographic elements and distance to the tectonic lineaments). They concluded that the slope aspect significantly influenced the distribution of superficial landslide types, but apparently not that of other landslide types. Slope aspect in the study area was constructed using DEM and classified into nine categories including flat (1), North (0–22.5; 337.5–360), Northeast (22.5–67.5), East (67.5–112.5), Southeast (112.5–157.5), South (157.5–202.5), Southeast (202.5–247.5), West (247.5–292.5), and Northeast (292.5–337.5) (Fig. 4b).
3.3.3. Slope length (LS)
The LS is another topographic factor that has a crucial impact on landslide susceptibility. Slope length incorporated with slope-angle and affect the soil loss and hydrological processes of the mountain areas (Pourghasemi and Rahmati, 2018). In the current study, the LS factor, ranges from 0 to 35.27 as shown inFig. 4c, was extracted from DEM using SAGA software according to Eq.(1)(Moore and Burch, 1986):
LS¼ As
22:13 0:4
Sinβ 0:0896
1:3
(1)
where,As(m2) is specific catchment area andβis in degree.
3.3.4. Distance from roads
The establishing of mountain roads (escarpment roads) require en- gineering activities such as cutting or excavating slopes, leading to changing the original geological conditions and weakening the natural support of rock slopes, these activities will have a significant and adverse
impact on the landslide occurrences (Wang et al., 2016; Xiao et al., 2019). Road construction (cutting and excavation) is an important anthropogenic factor that influences the topography of natural slopes (Xiao et al., 2019). Accordingly, distance from roads could be potential indicator for the landslides. In the current study, distance from roads has a range (0–2883 m), was developed using the Euclidean distance tool in ArcGIS 10.2 (Fig. 4d).
3.3.5. Lithology
Lithology variations have a significant impact on different types of geo-hazards (e.g., landslides and land subsidence) (Rahmati et al., 2016).
These units vary in physical and mechanical characteristics including, type, strength, degree of weathering, durability, density, and perme- ability (Henriques et al., 2015). In the current study, the lithology was extracted from the geological map (1: 250,000-scale) acquired from the Saudi Geological Survey database. To enhance the resolution of the li- thology map, landsat images (15 m resolution) were used to verify the lithology types using enhancement processing techniques (principle component analysis and band ratio combinations). Nine lithological units were identified including, biotite-quartzite-plagioclase granofels, gab- bro, alluvium and gravel, Jeddah and Bahah groups, Bahah group within the Tayyah bet, biotite monzogranite, basalt and andesite, metagabbro and gabbro, and Wajid sandstone (Fig. 4e).
3.3.6. Distance from wadis (main streams)
In the current study, DEM and topographic map were used to extract wadis (main streams) In fact, runoff along wadis plays a cardinal role in undercutting phenomena and increasing the pore water pressure of the areas adjacent to wadis and initiating landslide (Haigh and Rawat, 2012;
Hadji et al., 2013). Therefore, it becomes an essential conditioning factor in landslide susceptibility (Pradhan et al., 2010a,b). Distance from wadis, which has a range (0–1974 m), was extracted using the Euclidean dis- tance operation in ArcGIS 10.2 (Fig. 4f).
3.3.7. Distance from faults
The presence of structural discontinuities, which are tectonic breaks including faults, folds, fractures, joints, and shear zones, play a vital role in weakening the rock masses (decreasing the rock strength) and causing landslides (Lee et al., 2002; Kanungo et al., 2006; Bucci et al., 2016). So that, distance from faults could be potential indicators for the landslides.
In the current study, the geological map 1: 250,000-scale was used to extract the faults. Landsat images (15 m resolution) and high-resolution satellite images (GeoEye 05 m and Google earth<1 m resolutions) were used to verify and extract other main faults that were not detected in the geologic map using visual interpretation of satellite images. False color composites, enhancement andfiltering techniques of the Landstat images were provided the best visualization of the faults. The Euclidean distance tool in ArcGIS 10.2 was used to produce distance from faults map. Dis- tance from faults has a range of 0–2839 m (Fig. 4g).
3.3.8. Altitude
Altitude is controlled by various geological, geomorphological, and meteorological factors including, lithological units, weathering, wind action, and precipitations (Tsutsui et al., 2007; Vorpahl et al., 2012;
Pourghasemi et al., 2013a, b, c).Feizizadeh et al. (2014)indicated that altitude is one of the topographic factors that influence slope instability.
It has been frequently used in almost all landslide susceptibility analysis.
In this study, altitude map was prepared according to classification of the built DEM. It ranged between 1959 m and 2992 m above sea level (Fig. 4h).
3.3.9. Normalized difference vegetation index (NDVI)
The NDVI is considered an influencing variable in landslide suscep- tibility modeling (Althuwaynee et al., 2012). The NDVI plays an impor- tant role in immobilizing large amount of water and increasing the shear resistance and soil cohesion of the lithological mass (Sidle and Ochiai,
Fig. 4. The LCFs maps used in this study: (a) slope angle, (b) slope aspect, (c) slope length (LS), (d) distance from roads, (e) lithology, (f) distance from wadis, (g) distance from faults, (h) altitude, (i) NDVI, (j) plan curvature, (k) profile curvature, and (l) landuse/landcover.
2006). The NDVI was used in this study to mirror the vegetation density in any area. In general, the value of NDVI ranged from1 to 1; the high the value the denser the vegetation cover. The NDVI value of the current study area was extracted from the Landsat ETMþimages (acquired in January 2015) based on Eq.(2):
NDVI¼ðNIRRÞ
ðNIRþRÞ (2)
where NIR and R are the near infrared and red bands of the electro- magnetic spectrum. The NDVI values in the study area ranges from (0.4 to 0.585) where minus portion represents barren land and positive part shows vegetated areas (Fig. 4i).
3.3.10. Plan and profile curvatures
Haigh and Rawat (2012)indicated that landslide susceptibility could be influenced in different ways by slope shape and terrain morphology.
Curvature represents one of the basic terrain variables and utilized in various geomorphometrical analyses (Evans, 1979). For example, plan curvature has a direct impact on the convergence and dispersion of surface runoffs (Nasiri Aghdam et al., 2016). Whereas, the profile cur- vature affects the material deposition by managing the acceleration or deceleration of these materials on slope (Xiao et al., 2019). In the current work, plan and profile curvatures were extracted from the DEM using ArcGIS 10.2. The plan and profile curvatures were within the ranges of 16.0 to 12.88 and16.0 to 20.0, respectively (Fig. 4j and k).
3.3.11. Landuse/landcover
In the study area, human activities such as expansion of urban areas and infrastructure construction are playing significant role in the trig- gering of landslides. These engineering activities require cutting or excavating slopes for highways and building construction, leading to altering the original geological conditions and subsequently disturb the slope’s stability (Xiao et al., 2019). Based on the supervised classification technique of Landsat OLI images (acquired in 2016), the study area was classified into five landuse/landcover types including water bodies, buildup areas (residential areas), dense vegetation, rock outcrop, and sparse vegetation (Fig. 4l).
3.4. Modelling approach using machine learning techniques
Recently, machine learning techniques (MLTs) becomes a cornerstone in solving spatial modeling problems in the domain of natural hazards (e.g., landslide susceptibility assessment) (Shirzadi et al., 2018; Zhou et al., 2018; Ghorbanzadeh et al., 2019; Sevgen et al., 2019).Marjanovic et al. (2011)suggested that MLTs should be utilized as a supportive tool, rather than attempting to replace human expertise.Goetz et al. (2015) recommended that to produce new accurate modeling that could be used by decision makers, Earth scientists and planners, MLTs should be incorporated with the processing capabilities of GIS andfield dataset (e.
g., historical landslide inventory and geo-environmental factors). The machine learning techniques have some advantages including, (a) their ability to adjust its internal structure to the existing landslide data, (b) their capabilities in extracting knowledge from huge databases in an automatic way, and (c) they have ability to build classification (pre- dicting categorical predictor factors) and regression (predicting contin- uous dependent factors) in order to provide an accurate landslide model, their models are more cost efficient and rapid than conventional models and can be extended to large area analysis (Felicísimo et al., 2013;
Kavzoglu et al., 2019). In the current study, seven advanced machine learning techniques that vary in their degree of complexity were applied to evaluate their efficacy in landslide susceptibility mapping. They include SVM, RF, MARS, ANN, QDA, LDA, and NB.
3.4.1. SVM
It is a supervised learning method derived from statistical learning
theory and the structural risk minimization principle (Vapnik, 1995; Lee et al., 2017). This technique can be used for both classification and regression (Vapnik, 1995; Christianini and Shawe-Taylor, 2000). SVM deals with the binary classification model (Vapnik, 1995). The SVM al- gorithm uses the training data to generate a separating hyper-plane in the initial space of coordinates between two distinct categories. The aim of mentioned classification approach is not simply to separate the two classes, but to maximize the margin between them (Yao et al., 2008; Xu et al., 2012).Kanevski et al. (2009)showed that hyper-plane with a large margin should be more resistant to noise and possess better generaliza- tion than a hyper-plane with a small margin. It uses kernel functions to map the initial input space into a high-dimensional feature space where the points become more linearly separable (Chung and Fabbri, 2003).
Detailed illustrations of SVM modeling were examined by many authors (e.g.,Yao et al., 2008; Xu et al., 2012). To produce a successful SVM training and classification accuracy, it is required to choice of the adequate kernel function (Damasevicius, 2011). Four types of kernel function groups that are commonly used in SVM including linear kernel (LN), polynomial kernel (PL), radial basis function (RBF) kernel, and sigmoid kernel (SIG). In this study, the RBF was used for modeling. The
“Kernlab”package (Karatzoglou et al., 2016) was used in R 3.0.2 for landslide susceptibility mapping.
3.4.2. RF
The RF is an ensemble learning method generating many classifica- tion trees that are aggregated to compute a classification (Breiman et al., 1984; Breiman, 2001; Calle and Urrea, 2010; Micheletti et al., 2014).
Hansen and Salamon (1990)indicated that an ensemble of classification trees is more accurate than any of its individual members. One of the main advantages of RF is the resistant to over training and growing a large number of random forest trees where it does not create a risk of over-fitting (e.g. each tree is a completely independent random experi- ment). The RF algorithm data does not need to be rescaled, transformed, or modified. It has resistance to outliers in predictors and automatically handles the missing values. In this study, landslide susceptibility modeled using“randomForest”package (Briman and Cutler, 2018) in R 3.0.2 software.
3.4.3. MARS
The MARS model was used to separate the datasets into several splines (each one contain sub-classes) on an equivalent interval basis (Friedman, 1991). The data of each sub-classes are represented by a basis function (BF). The MARS predictor can be identified using Eq.(3)(Hastie et al., 2001):
fðxÞ ¼β0
Xp
j¼1
XB
b¼1
βjbð þ ÞMax 0;xj Hbj
þβjbð ÞMax 0;Hbjxj
(3)
wherex¼input,f(x)¼output,p¼predictor variables andB¼basis function.Max(0,x–H) andMax(0,H–x) are BF and do not have to each be present if their coefficients are 0. TheHvalues are called knots.
The MARS algorithm consists of three stages: a forward stepwise al- gorithm stage (to select certain spline) BFs, a backward stepwise algo- rithm stage (to delete BFs until the“best”set is located), and a smoothing stage, which gives thefinal MARS approximation a certain degree of continuity. The BFs are deleted in the order of least contributions using the generalized cross-validation (GCV) criterion (Craven and Wahba, 1979) using Eq.(4):
GCV¼N1 PN
i¼1½yifðxiÞ 2
1CðBÞN
2 (4)
whereNis the number of data andC(B) is a complexity penalty that increases with the number of BF in the model and which is identified using Eq.(5):
CðBÞ ¼ ðBþ1Þ þdB (5) wheredis a penalty for each BF included into the model. It can be also regarded as a smoothing parameter. MARS model was implemented in this study using“earth”package in R software (Milborrow et al., 2019) to prepare landslide susceptibility mapping.
3.4.4. ANN
The ANN is a quantitative method that makes use of nonlinear and complex learning and prediction algorithms to extract the complex re- lationships among the various independent landslide variables (Aleotti and Chowdhury, 1999; Chen et al., 2009; Pascale et al., 2013; Elkadiri et al., 2014). The architecture of this ANN is based on the Multi-Layer Perceptron (MLP). The MLP composed of a set of layers, each of which consists of a set of nodes (neurons). The MLP with the back-propagation algorithm is trained using a set of examples of associated input and output values (Turban and Aronson, 2001). The purpose of an ANN is to build a model of the data-generating process, so that the network can generalize and predict outputs from inputs that it has not previously seen.
A network has two use modes: learning and recall. Learning phase, car- ried out by back-propagation learning algorithm, used in providing a network which has initially random connection weights with a set of stimulus couples called learning examples (the input to the network and the expected output). Recall phase allows for the application of the synthesized knowledge from the preceding learning phase. In a succes- sive phase the neural network is able to provide coherent answers to input which is not presented in the training phase. In the current analysis, the“rminer”package (Cortez, 2020) was used for carrying ANN model by 12-6-1 network and learning rate of 0.1.
3.4.5. QDA
It is a univariate statistical method aimed at establishing a model based on the observed effective factors (Hong et al., 2017; Arabameri and Pourghasemi, 2019). Measurements of each class assumed to have a normal dispensation. However, the covariance of each class is not assumed to be the same (Eker et al., 2015). The one of the most advan- tages of QDA is the capability of dealing with different covariance values of the classes (Naghibi and Moradi Dashtpagerdi, 2017). The QDA ac- quires a category membership containing a square nnmatrix (nis number of expository variables) and a linear combination of these vari- ables according to Eq.(6)(Eker et al., 2015):
Q¼XTAXþOTXþZ (6)
whereAis thenncoefficient matrix,Oshows the linear combination coefficient, and Z is a constant. In the current work, QDA was executing using“caret”package in R software (Kuhn et al., 2017).
3.4.6. LDA
The LDA is a multivariate statistical technique, developed byFisher (1936), used for classifying objects from a set of input-independent fac- tors (Baeza and Corominas, 2001; Anderson, 2003; Baecher and Chris- tian, 2003; Eker et al., 2015; Hong et al., 2017; Arabameri and Pourghasemi, 2019). This model is robust, easy to use, has high prog- nostic accuracy, and the factor predictors are stable when the classes are well distinguishable (Steorts, 2014). In this technique, the estimated discrimination values (D) are determined by a linear combination which discriminates and clusters the causative variables into n classes using Eq.
(7)(Eker et al., 2015):
Di¼c1X1iþc2X2iþ…þ cnXni (7) whereXjiis theivalue of the independent variablej;cjis the discriminant coefficient for the variablej;Diis the value or discriminant scorei.
The discriminant coefficients are values that increase the distance be- tween the vector of mean values for each cluster (Baecher and Christian,
2003), namely in this research, the group of landslide occurrence and the group of non-occurrence of landslides. In this study, the R statistical package“caret”was utilized to apply the LDA model (Kuhn et al., 2017).
3.4.7. NB
This technique is a classification method which based on Bayes’
theorem. It assumes that all factors are utterly independent given the output class, which is called the conditional independence assumption (Soria et al., 2008). The most advantages of the NB technique are including; it is robust to noise and irrelevant variables, very easy to apply, and no need for complicated iterative schemes (Wu et al., 2008). Many researches were applied the NB method for landslide susceptibility mapping (Chen et al., 2009; Pham et al., 2016, 2017). Given an obser- vation consisting of n factorsxi,i¼1, 2,…,k, wherexirepresents the landslide-relating variables,nrepresents the number of LCFs, andy2{m, n), wheremandn indicate landslide and non-landslide areas, respec- tively, is the output class. NB estimates the probabilityP(yj/xi) for all possible output classes. The prediction is made for the class with the largest posterior probability using Eq.(8):
yNB¼ argmaxy
yiflandslide;nonlandslideg
PðyÞP
x1;x2;…::xyy
(8) To compute convenience, NB assumes that all variables are inde- pendent (Kim et al., 2017), thus Eq.(8)can be simplified to be Eq.(9):
yNB¼ argmaxy
yiflandslide;nonlandslideg
PðyÞYk
i¼1
P xyy
(9) To calculate each posterior probabilityP(xi|y), numerical attributes are usually assumed to follow a normal distribution with the value of the probability density function in Eq.(10)used for the posterior probability (Kim et al., 2017):
P
xyy¼ 1 2
xyye
ðxiμðijyÞÞ2
2πσ2ðijyÞ
!
(10)
whereμis the mean andσ2is the variance ofxi. In this study, Naïve Bayes algorithm was applied in R 3.0.2 using“rminer”package (Cortez, 2020) to model the landslide susceptibility.
3.5. The LCFs contribution
Evaluating the independent factors importance is vitally essential to identify the contribution of the different landslide factors and hence pre- cise determination of their role in modeling production. Various methods were successfully applied in landslide susceptibility mapping, such as the learning vector quantization (LVQ), and random forest techniques (RF) techniques (Pavel et al., 2008, 2011; Mondino et al., 2009; Youssef et al., 2016; Pourghasemi and Rahmati, 2018). In the current work, the relative importance and contribution of the all LCFs to landslide occurrence was assessed using RF data-driven model in R statistical package.
3.6. Modelling prediction and performance
In general, seven ML models were successfully established using the trained landslide datasets and the landslide susceptibility index (LSI) for every pixel in entire study area were calculated. Subsequently, these landslide susceptibility maps (LSMs) were reclassified into four suscep- tibility categories including low, moderate, high, and very high suscep- tible zones, using the natural breaks (Jenks) classification method (Pradhan and Lee, 2010; Pourghasemi and Kerle, 2016; Pourghasemi et al., 2020a, b,c,d,e).
To assess the prediction accuracy and performance of the models, the cross-validation results can be produced quantitatively and graphically by means of confusion matrices and their extracted statistics, receiver-
operating characteristic (ROC) plots, and prediction-rate curves (Beguería, 2006; De Sy et al., 2013; Rahmati et al., 2017; Pourghasemi et al., 2019a, 2019b, 2020). These results will help in extracting valuable practical information such as, (a) quantitative assessment of the predic- tion capability of the model, (b) identification of the prediction approach or mathematical functions that yield better predictions, (c) comparing the predictive capability of multiple landslide susceptibility models produced by different methodologies, (d) assessment of the capability of the model to discriminate the most prone zones and most importantly in many cases, the less susceptible areas (safest areas), (e) selection of the most significant factors and estimation of their contribution to the pre- diction of landslides, and (f) evaluate of the influence of the accuracy of input variables on the models, helping to enhance the quality of the model prediction. These knowledge may be vitally important for the selection of mitigation/remediation strategies.
The ROC curve, is a graph based on the true positive rate (sensitivity) and the false positive rate (1 specificity), can be considered as the statistic index of the overall performance of the model (Chung and Fabbri, 2003; Lee et al., 2004; Beguería, 2006).Mathew et al. (2009) indicated that the AUC is commonly remarkable index as the most useful accuracy statistic for landslide susceptibility analysis. In addition, the performance of MLT could also be achieved using RMSE statistic index to evaluate the credibility and quality of the landslide susceptibility pre- dictions when compared to observed values (Nefeslioglu et al., 2008b).
In the current study, the receiver-operating characteristic-area under the curve (ROC-AUC), and the root mean square error (RMSE) were applied to evaluate and measure the predictive performance of the MLTs.
Validation datasets (30% of the existing landslide validating datasets), that were not employed previously in the establishing of the models process, were executed to calculate the predictive performance of the models (Mason and Graham, 2002; Mohammady et al., 2012).
4. Results
The study area considers one of the richest and the most variable floristic regions of the Asir Mountains. One of the most famous mountains in the area is Jabal Al-Sooda, which located in the north western part of the watershed area, 2982 m high. The area is characterized by its tour- istic activities and also has a richflora. It has severe problem of landslides and degradation due to human activities, steep slope, weak geology, and rainfall and thus causing ecological imbalances. Landslides predictions obtained through landslide susceptibility modeling and simulation are nowadays seen as a milestone goal of natural resources studies. These models will play a vital role for the decision-making, watershed planners and engineers. Accordingly, the landslide susceptibility processes are necessary to be as accurate as possible.
4.1. Variables contribution analysis
To assess the importance of LCFs, the RF algorithm (Mean Decrease Accuracy and Mean Decrease Gini) was applied (Fig. 5). The importance of each factor is shown as a count of observations that appeared in x-axis (Fig. 5). The results from RF (Mean Decrease Accuracy) indicated that slope angle (gives 56.9 count of observations) is the most important contribution factor, followed by slope aspect (38.8), slope length (36.8), distance from roads (32.4), followed by lithology (23.2), distance from wadis (21.8), distance from faults (21.7), altitude (18.8), and NDVI (16.7). However, plan curvature (10.4), profile curvature (9.9), and landuse/landcover (4.9) seemed to have relatively less importance for landslide susceptibility modeling. The results from RF (Mean Decrease Gini) indicated that slope angle (gives 23.9 count of observations) is the most important contribution factor, followed by slope length (15.6), slope aspect (14.3), and distance from road (10.8), then distance from wadis (9.1), distance from faults (8.9), altitude (6.9), NDVI (5.9), li- thology (4.9). However, plan curvature (4.1), profile curvature (3.5), and landuse/landcover (1.3) give lower importance values.
4.2. Landslide susceptibility models
For the purpose of comparative analysis, the seven landslide suscep- tibility maps provided from the SVM, RF, MARS, ANN, QDA, LDA, and NB models are presented inFig. 6. Four susceptibility classes namely, low, moderate, high, and very high susceptible zones were assigned for each landslide susceptible model. The percentages relative area of these classes for each model was then calculated (Fig. 7).
4.3. Accuracy assessment and comparison
To evaluate the model performance, the AUC and RMSE statistics were calculated. Interestingly, according to AUC method (Table 1, Fig. 8), the variation in models performance among the MLTs was rela- tively high. The RF (AUC¼95.1%) and LDA (AUC¼94.1%) had the highest performances. They are followed by the ANN (AUC¼93.4%), SVM (AUC¼93.0%), MARS (AUC¼91.8%), NB (AUC¼91.6%), and QDA (AUC¼89.0%) models. All the MLTs exposed a sufficient perfor- mance (AUC > 70%). Aguirre-Gutierrez et al. (2013)indicated that model performance assessment using one factor such as AUC may not be an appropriate method because in some cases a high AUC may not be a guarantee for a high accuracy of the spatial predictions. For that reason, the RMSE measure also used as an additional criteria to evaluate model predictions and support decisions in model selection. The RMSE for applied MLTs ranges from 0.076 to 0.469 (Table 1) indicating a sub- stantial agreement between these models and the reality. The results of
Fig. 5. Factor importance of according to RF method, (a) Mean Decrease Accuracy and (b) Mean Decrease Gini.
this analysis revealed that MARS gives the best performance followed by the RF, the LDA, and the SVM algorithms, which yielded significantly better results. Finally, The ANN, the NB, and the QDA algorithms yielded the less significantly results than the other models in terms of the RMSE results. Accordingly, it was found that the performance of RF was always significantly higher than the other models based on the AUC method.
However, the MARS was significantly higher than the other models based on the RMSE method.
5. Discussions
The relative importance of the LCFs to a landslide model is dependent on the study area characteristics (Lin et al., 2010). Park (2015) mentioned that all independent variables will affect thefinal result of model prediction when concomitantly considered with other variables.
Dai and Lee (2002)andPourghasemi and Rahmati (2018)indicated that 6 LCFs play a vital role in predicting landslide susceptibility according to
generalize additive model (GAM) analysis including slope angle, distance from road, slope length, distance from fault, drainage density, and alti- tude. Ourfindings using importance analysis using RF is in agreement with the previousfindings of those authors wherefive factors used pre- viously have already considered in our RF importance categories.
Generally, using of the statistical models are the time-consuming in inputs process, outputs, and spatial analysis, while the MLTs have the advantage of automatically identifying the relationships between dependent and independent variables (Yilmaz, 2010).Felicísimo et al.
(2013)indicated that the most important factor that determine the ac- curacy of the landslide susceptibility mapping is the selection of the best model among of different machine learning techniques. Various machine learning models for landslide susceptibility mapping have been applied by many authors (Goetz et al., 2015; Tien Bui et al., 2016; Youssef et al., 2016; Pourghasemi and Rahmati, 2018), however, the prediction accu- racy of some of these techniques is still questionable. The selection of the best model among of various MLTs has a vital role in the landslide Fig. 6. Generated landslide susceptibility maps using: (a) SVM, (b) RF, (c) MARS, (d) ANN, (e) QDA, (f) LDA, and (g) NB.
susceptibility assessment (Rossi et al., 2010; Felicísimo et al., 2013). The most advantages of the MLTs are relatively easy to apply, and also the
prediction accuracy typically exceeds some of the conventional methods such as analytical hierarchy process (Tien Bui et al., 2012a,b; Pour- ghasemi and Rahmati, 2018). Zhou et al. (2018)concluded that the machine learning models (SVM and ANN) are excellent in both the landslide and rockfall susceptibility assessment. Their study demon- strates that the machine learning models are also applicative in complex nonlinear problem, where the SVM model has a better performance due to its globally optimal solution. In the current research, ourfindings complement previous results which have been done byFelicísimo et al.
(2013), who applied different machine learning models including mul- tiple logistic regression (MLR), MARS, CART, and maximum entropy (MaxEnt);Youssef et al. (2016), who applied RF, BRT, GLM and CART;
Zhou et al. (2018)who applied SVM and ANN; andPourghasemi and Rahmati (2018), who applied ANN, BRT, CART, GLM, GAM, MARS, NB, QDA, RF, and SVM techniques for landslide susceptibility assessment, and then compared their performances.
Various authors indicated that there are some issues related to land- slides susceptibility analysis which discussed as follow: (1) Lee et al.
(2017)assumed that these models could not be used and not valid for specific planning and assessment purposes, so that great caution should be exercised when using these models for specific site characterization, and the scale of the analysis should be considered with great care; (2) the landslide susceptibility models are depending on the variables used to implement the models. It was found that some models such as RF per- formed well in a specific areas (Catani et al., 2013), but give poor per- formance in other areas (Hong et al., 2016a).
Landslide susceptibility maps are capable to help produce a crucial guide for general planning and assessment purposes (Lee et al., 2017).
These techniques are very useful and practical for landslide management, landslide hazard and risk analysis (Einstein, 1988). Accordingly, under- standing the differences between various MLTs are essential to apply an optimal model for a specific study aim and/or area (Brenning, 2008).
For example; ourfindings indicated that these MLTs models could be vitally important as a preliminary techniques that help decision makers to identify the vulnerable areas for existing projects and for future new planning areas. So that potentially prone areas can be identified for detailed investigation. In addition, results of this study indicated that all the MLTs generate LSMs which are significantly performed higher (AUC ranges from 89% for QDA to 95.1% for RF) than the results produced by (Pourghasemi and Rahmati, 2018) (AUC ranges from 62.4% for GLM to 83.7% for RF). Also, our study especially for artificial neural networks (ANN) that give a prediction rate (AUC¼93.4%) was very close to the results ofElkadiri et al. (2014), who used ANN and logistic regression Fig. 7.Landslide susceptibility classes’areas percentage for all applied MLTs.
Table 1
Area under the curve and RMSE values of the models.
Models AUC (%) RMSE
SVM 93.0 0.391
MARS 91.8 0.076
RF 95.1 0.367
ANN 93.4 0.454
QDA 89.0 0.469
LDA 94.1 0.373
NB 91.6 0.464
Fig. 8. Prediction rate curves for the susceptibility maps produced in this study for different models.
(LR) modeling for debris-flow susceptibility assessment of Jazan. Their results indicated that the prediction performance for both models are 96.1% and 96.3% respectively. Also, the our results for RF with a pre- diction rate (AUC¼ 95.1%) was higher than the results obtained by Youssef et al. (2016), who applied RF, BRT, GLM and CART techniques for landslide susceptibility assessment along Wadi Tayyah Basin, Asir Region, Saudi Arabia. Their results indicated that the prediction rates are 81.2%, 85.6%, 86.2%, and 76.9% for RF, BRT, CART, and GLM models, respectively. We believed that these differences in prediction rates are due to the specific characteristics for each study area.
In addition to that results of RMSE values of MLTs which consider to be as additional criteria to evaluate the model predictions. Our results showed that MSE values range from 0.076 to 0.469 which are in sub- stantial agreement with the RMSE results generated byPourghasemi and Rahmati (2018)which range from 0.101 to 0.459.
The current study identified variables that can be contributed in landslides, and the results and methods that can be applied for landslide susceptibility mapping in other areas beyond the study areas. The novelty of the current study could be summarized as follow: (1) the application of seven MLTs are novel and new in Saudi Arabia, (2) coupling MLTs with spatial analytical methods for landslide susceptibility modeling is a worth considering issue, and (3) the presenting of comparison among the performances of the seven advanced MLTs.
6. Conclusion
In recent years, landslides have been considered to be the most crit- ical natural hazards (serious threat to life and property) worldwide as well in Saudi Arabia, and short- and long-term solutions are required.
Recently, monitoring and evaluating of landslide-prone areas using various state of-the-art MLTs in regional scale are fundamental for sus- tainable land management to be effective landslide susceptibility map represents an essential method to delineate the landslide prone areas.
This can be achieved with the help of integrating of MLTs and spatial data in GIS environment. The current study indicated that:
(1) A comprehensive comparison of the seven advanced machine learning models for Wadi Abha Basin, Asir Region, Saudi Arabia were carried out to identify areas of possible landslide susceptible areas. The susceptibility analyses were conducted through the integration of readily available remote-sensing data, GIS tech- niques, and machine learning techniques using seven models (SVM, RF, MARS, ANN, QDA, LDA, and NB).
(2) These models are novel approaches to perform the landslide sus- ceptibility mapping in the Wadi Abha Basin, Asir Region, Saudi Arabia.
(3) The importance of landslide conditioning factors in the current study that contributed in landslide phenomena were identified using RF algorithm (Mean Decrease Accuracy and Mean Decrease Gini). It is found that the most important components in landslide susceptibility modeling are slope angle, slope aspect, slope length, distance from roads, distance from wadis, distance from faults, altitude, and NDVI, whereas plan curvature, profile curvature, and landuse/landcover had the lowest contribution.
(4) The validation process was carried out by comparing the produced LSMs with the actual landslide occurrences using the ROC curve and RMSE methods. The performances of these models were evaluated by the ROC curves. Results indicated that the models performance are high with AUC values of 95.1%, 94.1%, 93.4%, 93.0%, 91.8%, 91.6%, and 89.0% for FR, LDA, ANN, SVM, MARS, NB, and QDA respectively. The RMSE indicated that there are an agreement between these MLTs models and the landslide real data. The RMSE results showed that the MARS, RF, LDA, and SVM algorithms yielded significantly best results and the ANN, NB, and QDA algorithms produced less significant results.
(5) The results from this study demonstrate the benefit of applying the optimal MLT with proper accuracy and lesser learning time in landslide susceptibility assessment. As a main advantage, MLTs utilize both continuous and categorical data without the need to classify continuous factors (e.g. altitude, proximities, drainage density, etc.), which provide results that are more robust and reliable, compared to the traditional statistical models, and eases the misclassification issues.
(6) The landslide susceptible maps, generated using MLTs, could be essential applied by the decision-makers, planners, and engineers, to identify landslide-prone areas, to prevent and mitigate the landslide risks areas, determine the suitable landuse planning areas, and establishing early warning systems.
(7) These MLTs are particularly useful in inaccessible areas and those lacking adequate monitoring systems. Remote-sensing datasets are nearly the only source of consistent and high-resolution ob- servations over remote and rugged mountainous regions where debrisflows typically occur.
(8) Application of the developed methodologies in the Saudi territory (hills and mountain areas) and elsewhere worldwide will lead to significant improvements in assessing and understanding the na- ture and scope of landslide problems and will contribute to building safe and sustainable communities around the world.
Declaration of competing interest
The authors declare that they have no known competingfinancial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgment
This research was funded by Shiraz University (Grant No.
98GRC1M271143). Authors would like to thank three anonymous re- viewers and also Associate Editor Prof. E. Shaji for valuable comments.
References
Abdullah, M.A., Al-Mazroui, M.A., 1998. Climatological study of the southwestern region of Saudi Arabia. I. Rainfall analysis. Clim. Res. 9, 213–223.
Abu Abdullah, M.M., Youssef, A.M., Nashar, F., Abu AlFadail, E., 2019. Statistical Analysis of Rainfall Patterns in Jeddah City, KSA: Future Impacts. In: Abbot, J., Hammond, A.
(Eds.), Rainfall - Extremes, Distribution and Properties. IntechOpen.https://doi.org/
10.5772/intechopen.86774. Available from:https://www.intechopen.com/books/ra infall-extremes-distribution-and-properties/statistical-analysis-of-rainfall-patterns -in-jeddah-city-ksa-future-impacts.
Aditian, A., Kubota, T., Shinohara, Y., 2018. Comparison of GIS-based landslide susceptibility models using frequency ratio, logistic regression, and artificial neural network in a tertiary region of ambon, Indonesia. Geomorphology 318, 101–111.
Aghdam, I.N., Varzandeh, M.H.M., Pradhan, B., 2016. Landslide susceptibility mapping using an ensemble statistical index (Wi) and adaptive neuro-fuzzy inference system (ANFIS) model at Alborz Mountains (Iran). Environ. Earth Sci. 75 (7), 553.
Aguirre-Gutierrez, J., Carvalheiro, L.G., Polce, C., van Loon, E.E., Raes, N., Reemer, M., Biesmeijer, J.C., 2013. Fit-for-purpose: species distribution model performance depends on evaluation criteria-Dutch hoverflies as a case study. PloS One 8 (5), e63708.https://doi.org/10.1371/journal.pone.0063708.
Aleotti, P., Chowdhury, R., 1999. Landslide hazard assessment: summary review and new perspectives. Bull. Eng. Geol. Environ. 58, 21–44.
Almazroui, M.A., 2011. Calibration of TRMM rainfall climatology over Saudi Arabia during 1998–2009. Atmos. Res. 99, 400–414.
Althuwaynee, O.F., Pradhan, B., Lee, S., 2012. Application of an evidential belief function model in landslide susceptibility mapping, Comput. Geosci. 44, 120–135.
Althuwaynee, O.F., Pradhan, B., Lee, S., 2016. A novel integrated model for assessing landslide susceptibility mapping using CHAID and AHP pair-wise comparison. Int. J.
Rem. Sens. 37, 1190–1209.
Anderson, T., 2003. An Introduction to Multivariate Statistical Analysis. In: Wiley Series in Probability and Statistics. John Wiley&Sons, Inc.
Arabameri, A., Pourghasemi, H.R., 2019. Spatial modeling of gully erosion using linear and quadratic discriminant analyses in GIS and R. In: Pourghasemi, H.R., Gokceoglu, C. (Eds.), Spatial Modeling in GIS and R for Earth and Environmental Sciences. Elsevier, pp. 299–321.https://doi.org/10.1016/B978-0-12-815226- 3.00013-2.
Ardau, F., Balia, R., Bianco, M., De Waele, J., 2007. Assessment of cover-collapse sinkholes in SW Sardinia (Italy). In: Parise, M., Gunn, J. (Eds.), Natural and