ORIGINAL RESEARCH Open Access
© The Author(s) 2023. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
FireXnet: an explainable AI-based tailored deep learning model for wildfire detection on resource-constrained devices
Khubab Ahmad1, Muhammad Shahbaz Khan2, Fawad Ahmed3, Maha Driss4,5, Wadii Boulila4,5, Abdulwahab Alazeb6, Mohammad Alsulami6, Mohammed S. Alshehri6, Yazeed Yasin Ghadi7 and Jawad Ahmad2*
Abstract
Background Forests cover nearly one-third of the Earth’s land and are some of our most biodiverse ecosystems.
Due to climate change, these essential habitats are endangered by increasing wildfires. Wildfires are not just a risk to the environment, but they also pose public health risks. Given these issues, there is an indispensable need for effi- cient and early detection methods. Conventional detection approaches fall short due to spatial limitations and man- ual feature engineering, which calls for the exploration and development of data-driven deep learning solutions. This paper, in this regard, proposes ’FireXnet’, a tailored deep learning model designed for improved efficiency and accu- racy in wildfire detection. FireXnet is tailored to have a lightweight architecture that exhibits high accuracy with sig- nificantly less training and testing time. It contains considerably reduced trainable and non-trainable parameters, which makes it suitable for resource-constrained devices. To make the FireXnet model visually explainable and trust- able, a powerful explainable artificial intelligence (AI) tool, SHAP (SHapley Additive exPlanations) has been incorpo- rated. It interprets FireXnet’s decisions by computing the contribution of each feature to the prediction. Furthermore, the performance of FireXnet is compared against five pre-trained models — VGG16, InceptionResNetV2, InceptionV3, DenseNet201, and MobileNetV2 — to benchmark its efficiency. For a fair comparison, transfer learning and fine-tun- ing have been applied to the aforementioned models to retrain the models on our dataset.
Results The test accuracy of the proposed FireXnet model is 98.42%, which is greater than all other models used for comparison. Furthermore, results of reliability parameters confirm the model’s reliability, i.e., a confidence interval of [0.97, 1.00] validates the certainty of the proposed model’s estimates and a Cohen’s kappa coefficient of 0.98 proves that decisions of FireXnet are in considerable accordance with the given data.
Conclusion The integration of the robust feature extraction of FireXnet with the transparency of explainable AI using SHAP enhances the model’s interpretability and allows for the identification of key characteristics triggering wildfire detections. Extensive experimentation reveals that in addition to being accurate, FireXnet has reduced computa- tional complexity due to considerably fewer training and non-training parameters and has significantly fewer training and testing times.
Keywords Wildfire, Fire detection, CNN, Transfer learning, Lightweight architecture
*Correspondence:
Jawad Ahmad [email protected]
Full list of author information is available at the end of the article
Resumen
Antecedentes Los bosques cubren cerca de un tercio de la superficie terrestre y representan unos de nuestros ecosistemas más diversos. Debido al cambio climático, estos hábitats esenciales están en peligro por el incremento de los incendios. Los incendios no solo representan un riesgo para el ambiente, sino también ponen en riesgo la salud pública. Dados estos temas, hay una necesidad indispensable para desarrollar métodos eficientes de detección temprana. Los enfoques convencionales de detección son de corto alcance debido a las limitaciones espaciales y las características ingenieriles de procesamiento manual, lo que llama a explorar y desarrollar soluciones basadas en la multiplicidad de procesamiento de datos aplicando el aprendizaje profundo (deep learning). Este trabajo, en ese sentido, propone el ’FireXnet’, un modelo ajustado de deep learning diseñado para mejorar la eficiencia y exactitud en la detección de incendios. FireXnet fue configurado para tener una arquitectura ligera que exhibe una gran exactitud con menor entrenamiento y tiempo de prueba. Contiene un número considerablemente reducido de parámetros entrenables y no entrenables, lo que lo hace adecuado para dispositivos limitados en recursos. Para hacer el modelo FireXnet visualmente explicable y confiable, fue incorporada una poderosa herramienta explicativa de la inteligencia artificial (AI) SHAP (Shapley Additive exPlanations). Ella interpreta las decisiones de FireXnet computando la con- tribución de cada característica a la predicción. Además, la performance de FireXnet fue comparada con otros cinco modelos pre-entrenados – VGG16, InceptionResNetV2, Inception V3, DenseNet201, y MobileNet2—para comparar su eficiencia. Para una buena comparación, el aprendizaje transferido y su sintonía fina fue aplicada a los modelos mencionados para re-entrenar los modelos en nuestro conjunto de datos.
Resultados La prueba de exactitud del modelo propuesto FireXnet es del 99.53%, lo que es mayor que todos los modelos usados para comparación. Además, la confiabilidad de los parámetros también confirma su confiabilidad (i.e. intervalo de confianza de 0.98, 1.00), lo que valida la certidumbre de la estimación de los modelos propuestos y un coeficiente Cohen´s kappa de 0.99, prueba que las decisiones de FireXnet están de acuerdo con los datos proporcionados.
Conclusión La integración de la robusta presentación extraída de FireXnet con la explicable transparencia de la AI usando SHAP mejora la interpretación del modelo y permite la identificación de las características clave que disparan las detecciones de incendios. La experimentación extensiva revela que adicionalmente a ser exacto, el modelo Fire- Xnet pudo reducir la complejidad computacional debido a considerablemente menores parámetros de entrenami- ento y no entrenamiento y significativamente menores tiempos de entrenamiento y testeos.
Introduction
Forest ecosystems play an essential role in the world’s biodiversity, as many forests on earth happen to be more biodiverse than ecosystems. Forests occupy more than 30% of the world’s land area (World Health Organi- zation 2022). Recently, most forests are experiencing alarmingly increased wildfires, especially due to climate change. Wildfires may originate from natural occur- rences, such as lightning strikes, volcanic eruptions, or periods of intense dry heat. In addition, human activi- ties, both unintentional and intentional, are also sig- nificant contributors to these fires. The practice of employing controlled fires to manage agricultural land and pastures is one such human-induced factor. Wild- fires also happen to be a major source of air pollution, generating a mixture of pollutants in the form of smoke.
The most hazardous constituent of wildfire smoke is par- ticulate matter, which is a potential public health haz- ard. Additionally, wildfires contribute to the emission of greenhouse gases and lead to the degradation of eco- systems. Wildfire-induced smoke and ash can have dev- astating impacts on the most vulnerable demographics,
including infants, pregnant women, the elderly, and individuals with pre-existing respiratory or cardiac con- ditions. From 1998 to 2017, the global mortality associ- ated with volcanic activity and wildfires was estimated at approximately 2400 (Food and Agriculture Organi- zation of United Nations 2020). Given the substantial consequences for both human populations and natural ecosystems, there is an indispensable need for the timely detection of wildfires. There are mainly two types of fire detection systems: (a) sensor-based and (b) vision-based (Sathyakala et al. 2018). The sensor-based approaches use various sensors to detect signals like temperature, sound, humidity, etc., whereas vision-based approaches involve images or videos of the fire.
Sensor-based approaches produce tangible results, but they lag behind vision-based systems for several reasons.
Several studies (Li et al. 2019; Qiu et al. 2019; Rjoub et al.
2022; Rizanov et al. 2020; Rizk et al. 2020; Correia et al.
2021) reveal that sensor-based systems often depend on predictive models and algorithms which, although sophisticated, may not account for the complexity and unpredictable nature of wildfires. In addition, they can
have significant spatial limitations, with some sensors offering a resolution as high as 28.6 m, which might not be sufficient for comprehensive early detection. The major challenge that sensor-based systems pose is the coverage of large areas. As opposed to sensor-based sys- tems, vision-based systems tend to be more beneficial.
Traditional methods for vision-based fire detection have been extensively investigated. Traditional vision-based systems usually consider the features involving color space (Çelik and Demirel 2009; Chen et al. 2004), spatial features (Hashemzadeh and Zademehdi 2019; Ko et al.
2010; Kong et al. 2016; Xuan Truong and Kim 2012) and features indicating motion (Foggia et al. 2015; Ha et al.
2012; Hashemzadeh and Zademehdi 2019; Xuan Truong and Kim 2012). Nonetheless, these techniques rely on the manual experiments-based adjustment of thresh- old parameters and depend heavily on expert knowl- edge for feature engineering. They also tend to employ handcrafted features, which only capture shallow fea- tures of the flame. Recently, deep learning has emerged to be a potential data-driven learning method that can learn through real-time data and can improve its deci- sions based on data-driven learning. Deep learning has successfully been applied across numerous domains, including object detection (Sun et al. 2019 and Ghandorh et al. 2022), classification (Tang et al. 2019 and Rasool et al. 2022), segmentation (Chen et al. 2018), and diag- nosis (Ben Atitallah et al. 2022 and Rehman et al. 2022).
Regarding fire detection, this shift in the utilization of deep learning has led to a growing interest in convolu- tional neural networks (CNN) for fire feature extraction.
Unlike traditional vision-based approaches, CNN-based approaches do not require manual feature extraction and can learn deep and more meaningful features from the provided data. The interest in using CNNs has resulted in a considerable increase in their accuracy (Dunnings and Breckon 2018; Muhammad et al. 2019a, 2019b; Saeed et al. 2019), even on unseen data and in uncertain practi- cal environments (Muhammad et al. 2019b).
While the deep learning-based methods have shown promising results, they bear limitations, especially in terms of computational cost and inference time (Li et al.
2022). Several studies use ensemble methods, such as in (Ghali et al. 2022), but these methods may incur a high computational cost and time. Despite achieving high accuracy, the real-time applicability of deep learning- based systems for different types of fire and environ- mental conditions remains a concern. The limitations of existing studies, such as high computational cost, signifi- cant inference time, model size, and the total number of parameters, suggest that there is a need for more tailored and efficient deep learning models that can ensure high accuracy, faster inference time, and robustness to various
conditions for early wildfire detection. In addition, most of the existing approaches do not incorporate explainable AI. The application of explainable AI (XAI) in wildfire detection can enhance interpretability and trust in deep learning models. It offers visibility into the decision-mak- ing process, allowing researchers to understand why spe- cific fire characteristics trigger detections. Furthermore, XAI can aid in identifying and correcting potential biases in the model, improving its performance, and allowing more precise and reliable early fire detection.
For the automated and efficient detection of wildfires, this paper presents a tailored CNN named “FireXnet”
that has improved accuracy and reduced computational time. The proposed lightweight architecture has con- siderably reduced trainable and non-trainable param- eters, which makes this model lightweight. Moreover, a powerful explainable AI tool, i.e., SHAP (SHapley Addi- tive Explanations) has also been incorporated to break down the prediction of the proposed model and to show the impact of each feature. Furthermore, to compare the performance of the proposed “FireXnet” model, transfer learning with fine-tuning has been applied to five pre- trained models, i.e., VGG16, InceptionResNetV2, Incep- tionV3, DenseNet201, and MobileNetV2.
Main contributions
The main contributions of this paper are:
1. A tailored lightweight CNN, named “FireXnet,” is proposed, demonstrating high accuracy with signifi- cantly less training and testing time. The proposed model contains a considerably reduced amount of trainable and non-trainable parameters.
2. To make the proposed FireXnet model trustable and visually explainable, a powerful XAI tool, SHAP (SHapley Additive Explanations) has been incorpo- rated to interpret its decisions by computing the con- tribution of each feature to the prediction and visu- ally representing this feature extraction.
3. To compare the performance of the proposed Fire- Xnet model, five state-of-the-art pre-trained mod- els, i.e., VGG16, InceptionResNetV2, Inception V3, MobileNetV2, and DenseNet201 have been imple- mented. For a fair comparison, transfer learning and fine-tuning have been applied to the aforementioned five pre-trained models to retrain the models on our own dataset.
To the authors’ best knowledge, no work has been con- ducted in the literature related to wildfire detection that incorporates the SHAP XAI tool with a deep learning model that is tailored to be light-weight and has mini- mal training and testing times, and hence, is suitable to
be deployed on resource-constrained devices, such as drones. The rest of the paper is organized as follows. The
“Literature review” section presents a literature review of the traditional sensor-based and recent deep learning- based fire detection systems. The "Methodology" section entails the methodology used to design the proposed lightweight deep learning FireXnet. It focuses on details like the utilized dataset, the architecture of the proposed model, the use of SHAP XAI, and how transfer learning is applied to pre-trained models for comparison. This section is followed by the "Results" and "Conclusion" sec- tions, respectively.
Literature review
Several types of fire detection systems can be found in the literature. The traditional systems being the sensor- based and the recent ones that utilize images and video feeds along with deep learning to detect wildfires effi- ciently. This section reviews both of the aforementioned fire detection systems and summarizes the positive fea- tures and limitations of both types of fire detection sys- tems. Speaking of traditional sensor-based systems, these systems mainly utilize sensors, such as gas sen- sors, temperature sensors, optical sensors, and infrared sensors (Zhang et al. 2021). For instance, in a study (Li et al. 2019) dedicated to fire prevention, a long-range Raman distributed fiber temperature sensor (RDFTS) is utilized. The technique employed called the Tem- perature Early Warning Model (TEWM), makes use of first and second-order moving average methods to predict temperature trends. The authors claim that this method is particularly useful for predictions intended for varying temperature trends. The utilized tempera- ture sensor has a sensing range of 1.38 km and 28.9 km, respectively, and can predict the temperature trend 46 s in advance. Similarly, a subsequent study (Qiu et al. 2019) employed a laser-based carbon monoxide (CO) sen- sor for fire detection. This system, which incorporates a digital lock-in amplifier (DLIA) for wavelength modula- tion spectroscopy (WMS), significantly enhances detec- tion sensitivity. The study claims that this sensor offers improved sensitivity in fire detection. Another study (Rjoub et al. 2022) explored an energy management sys- tem for forest monitoring and wildfire detection. This system employed a drone equipped with LIDAR (Light Detection and Ranging) and air quality sensors. The researchers developed an autonomous patrolling system that optimizes UAV’s energy consumption while detect- ing wildfire incidents. It claims that by formulating an optimization problem, it was able to minimize the over- all UAV’s energy consumption during patrols. Simulation results demonstrated the efficiency and validity of this solution. Besides, a novel terrestrial system for remote
wildfire detection is described by (Rizanov et al. 2020), which employs single-pixel optoelectronic detectors for triple infrared (IR) band analysis and long-range commu- nication. The system uses a linear classification method and Bayesian theory for data analysis. The researchers claim that the proposed system combines the advan- tages of existing approaches (camera-based systems and Wireless Sensor Networks) and reduces false positives by limiting the influence of external environmental fac- tors on wildfire detection. In another paper (Rizk et al.
2020), researchers proposed a low-cost wireless sensor network for the detection of fires, considering flexibility, scalability, and power consumption requirements. This paper focuses on creating a network of sensors for fire detection. The developed system was tested and verified to be an efficient solution for fire detection in Lebanon’s forests. Another paper (Correia et al. 2021) detects fire by proposing a method that uses energy measurements to localize moving acoustic signals. The sensor in focus is an acoustic sensor, specifically placed on a drone. The localization technique used here includes the Kalman fil- ter. The authors claim that the proposed solution demon- strates superior performance over techniques that do not consider prior knowledge of process states. Despite the promising advancements in sensor-based wildfire detec- tion systems, limitations still exist. The systems often depend on predictive models and algorithms which, although sophisticated, may not account for the com- plexity and unpredictable nature of wildfires. In addition, they can have significant spatial limitations, with some sensors offering a resolution as high as 28.6 m, which might not be sufficient for comprehensive early detec- tion. The energy consumption for drone-based systems and the potential for false positives in terrestrial systems have further constraints. Thus, while these systems seem to be promising, they are not yet fully reliable for real- time, large-scale wildfire detection and prevention.
Regarding the recent utilization and exploration of deep learning algorithms in developing fire detec- tion systems, the authors (Ghali et al. 2022) employ Unmanned Aerial Vehicles (UAVs) for wildfire detec- tion. The detection technique is based on a novel deep ensemble learning method, which combines Efficient- Net-B5 and DenseNet-201 models for identifying and classifying wildfires using aerial images. Additionally, vision transformers (TransUNet and TransFire) and a deep convolutional model (EfficientSeg) are used for segmenting wildfire regions. The authors claim an accuracy of 85.12% for wildfire classification. The authors (Zhao et al. 2018) utilized a UAV equipped with global positioning systems (GPS) for wildfire detection. A new detection algorithm is proposed for the location and segmentation of core fire areas
in aerial images. They also present a 15-layered self- learning Deep Convolutional Neural Network (DCNN) architecture named “Fire_Net” for fire feature extrac- tion and classification. The proposed method claims an overall accuracy of 98% making it suitable for wildfire detection. Moreover, in another paper (Akhloufi et al.
2018), the authors propose a deep convolutional neural network named Deep-Fire for fire pixel detection and fire segmentation. This technique was tested on a data- base of wildland fires, and they claim it provides very high performance. Similarly, a deep learning-based method for detecting wildfires by identifying flames and smoke has been presented by (Oh et al. 2020). The authors claim that their approach, which uses a CNN with a large dataset acquired from the web, is effec- tive for early wildfire detection in a video surveillance system. Moreover, the authors (Toan et al. 2019) have developed an autonomous and intelligent system that uses satellite images with a novel deep-learning archi- tecture for locating wildfires at the pixel level. They claim that their approach has a superior performance over the baselines with a 94% F1-score and is robust against different types of wildfires and adversarial con- ditions. Besides, a comprehensive study of the chal- lenges and limitations of wildfire detection using deep learning has been conducted (Sousa et al. 2020). They have proposed a transfer learning approach combined with data augmentation techniques. They claim their approach provides insights into the patterns causing misclassifications, which can guide future research toward the implementation of expert systems in fire- fighting and civil protection operations. In addition, Unmanned Aerial Vehicles (UAVs) have also been used for wildfire detection (Reis and Turk 2023). The authors employ transfer learning techniques and deep learn- ing algorithms, including InceptionV3, DenseNet121, ResNet50V2, NASNetMobile, and VGG-19, in combi- nation with Support Vector Machine, Random Forest, Bidirectional Long Short-Term Memory, and Gated Recurrent Unit algorithms. They report an accuracy of 97.95% using the DenseNet121 model started with ran- dom weights, and 99.32% accuracy in the DenseNet121 model using ImageNet weights, suggesting the method can be entirely satisfactory for forest fire detection and response.
While the methods proposed in these papers have shown promising results, they also bear limitations.
The use of ensemble methods combining multiple deep learning architectures, such as in (Ghali et al.
2022), may incur a high computational cost and time.
Despite achieving high accuracy, the real-time appli- cability of these systems under varying fire and envi- ronmental conditions remains a concern (Zhao et al.
2018; Oh et al. 2020; Toan et al. 2019; Reis and Turk 2023). In (Sousa et al. 2020), although transfer learn- ing was used to leverage existing data, the analysis pointed out the issue of misclassifications due to pat- terns, indicating the model’s difficulty in dealing with complex real-world scenarios. The common limitations of these studies are high computational cost, signifi- cant inference time, model size, and the total number of parameters. These limitations suggest that these deep learning-based studies could further benefit from new improvements. There is a need for more tailored and efficient deep learning models that can ensure high accuracy, faster inference time, and robustness to vari- ous conditions for early wildfire detection.
Methodology
The methodology adopted to design the lightweight deep learning model with SHAP as an XAI tool is given in Fig. 1. The methodology consists of three important phases, i.e., (i) the dataset pre-processing phase, which involves the bifurcation of data in two different classes and the use of data augmentation techniques, (ii) the detection and inference phase, which focuses on the data splitting and the design of the deep learning model, and finally (iii) The explainable AI phase that incorporates SHAP as an XAI tool to interpret the proposed model’s decisions by computing the contribution of each feature to the prediction.
Dataset
For the detection of wildfire, we utilized multiple data- set sources to include a diverse image for our model’s training. The primary dataset, obtained from Kaggle (Kaggle 2021), comprises two distinct classes: Fire and No Fire, encompassing a total of 1900 images. In addi- tion, Smoke images were acquired from Github’s DFire- Dataset (DFireDataset 2023), and the thermal fire images were sourced from IEEE DataPort’s Flame 2 dataset (Flame 2, 2023). There are four classes in total:
Fire, No Fire, Smoke, and Thermal Fire. This diverse dataset comprises a total of 3800 images (950 images for each class). Moreover, of these 3800 images, 90%
(3420 images) were used for training and validation purposes, and the rest of 10% (380 images) were used for testing. The 3420 images from the training and vali- dation portion were further been divided into an 80:20 ratio, i.e., 2736 (80%) images were specified for train- ing and the rest of 684 (20%) for validation, without any data overlapping. Details pertaining data splitting are given in Table 1 and the sample images represent- ing different classes are given in Fig. 2. Moreover, for the proper classification, input images need to be
Fig. 1 The methodology adopted for explainable deep learning for fire detection
Table 1 Details of the dataset formulation and splitting
Classes Dataset
Training Validation Testing Total
Fire 684 171 95 950
No Fire 684 171 95 950
Smoke 684 171 95 950
Thermal Fire 684 171 95 950
Total 2736 684 380 3800
pre-processed using data augmentation techniques.
The dataset has been augmented by applying (1) rota- tion, (2) width shift, (3) height shift, and (4) zoom. The samples of images after data augmentation techniques are given in Fig. 3.
The proposed FireXnet model
The main objective of this research is to propose a deep learning model specially tailored to accu- rately detect wildfires through images. The model is intended to be lightweight, i.e., it should have consid- erably fewer trainable and non-trainable parameters, reduced computation time, and improved accuracy.
A sequential CNN is chosen as the base model and is modified to achieve the aforementioned objectives.
Sequential CNN is chosen because it is a simple and commonly preferred architecture for image classifica- tion applications due to its ease of implementation.
Furthermore, it is easy to modify it to achieve the desired model that in addition to being accurate, is more practical and can be easily deployed on resource- constrained devices. The proposed model contains three sequential convolution blocks, and a max pool- ing layer is appended after the last convolution layer in each block. Furthermore, a global average pooling layer has also been added after the third convolution block, it takes the average of each feature map and feeds it to the classifier block. The global average pool- ing enforces the correspondence between feature maps and categories, due to which, the feature maps can
be easily interpreted as categorized confidence maps.
The architecture of the proposed modified lightweight model FireXnet is given in Fig. 4. In each convolutional layer, the rectified linear unit (ReLU) activation func- tion is used. The first block consists of two convolu- tional layers (each with 64 filters having a 3 × 3 kernel size) and a max pooling layer (2 × 2 kernel size). Simi- larly, the second block is made up of two convolutional layers (each with 128 filters having a 3 × 3 kernel size) and a max pooling layer (2 × 2 kernel size).
The third block, however, is made up of three con- volutional layers (each with 128 filters of 3 × 3 kernel size) and a max pooling layer (2 × 2 kernel size). After flattening, in the fully connected layers, there are two dense layers with the ReLU activation function having 128 and 32 units, respectively. The addition of a batch normalization layer holds benefits, such as acceler- ating network training and reducing the difficulty of initializing early weights. The dropout layers, on the other hand, help in preventing the neural network from over-fitting. In the last layer, the softmax activa- tion function is used with four units. It provides the output for four classes, i.e., Fire, No Fire, Smoke, and Thermal Fire. The summary of the proposed model is given in Table 2.
Explainable AI using SHAP (SHapley Additive exPlanations) SHAP is an explainable AI method that uses princi- ples from cooperative game theory to help explain the Fig. 2 Samples of the wildfire dataset assessed through Kaggle (Kaggle 2021), Github (DFireDataset 2023), and IEEE DataPort (Flame 2, 2023)
Fig. 3 Samples of the images after data augmentation
performance of machine learning or deep learning models. It provides a comprehensive view of how each feature or input variable contributes to the model’s prediction for a particular instance. The visualization displays the SHAP values, which represent the average marginal contribution of each feature to the prediction across all possible feature combinations. This allows users to understand the relative importance of differ- ent features in the model’s decision-making process and identify potential biases or anomalies. SHAP uses a model that ensures three properties: local accuracy, missingness, and consistency.
Local accuracy requires that for a particular input a , the explanatory model g matches the output ∅ of the original model f . This means that the predicted value of the explanatory model must match the output of the original model when a certain simplified input a′ is given.
This is explained in Eq. (1).
Missingness is the second property, which states that if certain features are missing in the original input, they will not have an impact on the outcome.
(1) f(a)=g a′ =∅0+
N
i=1
∅ia′i Fig. 4 Architecture of the proposed FireXnet model
Table 2 Summary of the proposed FireXnet model
Layer (type) Output shape Parameters
input_15 (Input Layer) (None, 224, 224, 3) 0 block1_conv1_ (Conv2D) (None, 224, 224, 64) 1792 block1_conv2_ (Conv2D) (None, 224, 224, 64) 36928 block1_pool_ (MaxPooling2D) (None, 112, 112, 64) 0 block2_conv1_ (Conv2D) (None, 112, 112, 128) 73856 block2_conv2_ (Conv2D) (None, 112, 112, 128) 147584 block2_pool_ (MaxPooling2D) (None, 56, 56, 128) 0 block3_conv1_ (Conv2D) (None, 56, 56, 256) 295168 block3_conv2_ (Conv2D) (None, 56, 56, 256) 590080 block3_conv3_ (Conv2D) (None, 56, 56, 256) 590080 block3_pool_ (MaxPooling2D) (None, 28, 28, 256) 0 global_average_pooling2d_10_ (None, 1, 1, 256) 0 batch_normalization_42_ (None, 1, 1, 256) 1024
flatten_14 (Flatten) (None, 256) 0
dense_42 (Dense) (None, 128) 32896
batch_normalization_43_ (None, 128) 512
dropout_34_ (Dropout) (None, 128) 0
dense_43_ (Dense) (None, 32) 4128
batch_normalization_44_ (None, 32) 128
dropout_35_ (Dropout) (None, 32) 0
dense_44_ (Dense) (None, 4) 132
Consistency, the third property, suggests that if a fea- ture becomes more influential in a model, the value assigned to that feature will not decrease.
For all inputs y′ ∈ {0, 1} , then ∅i
f′,a
≥∅i f,a However, there are limitations with these properties as they are not always understood in other additive feature attribution methods (Wang et al. 2021). The only model that satisfies all these properties is given by Eq. (4).
where y′
is a non-zero entry in y′ . The solution to Eq. (4) has been proposed by Lundberg and Lee (Lundberg et.al.
2017), where fx
y′
=f ha
y′
=E f
y
|yS
. Here S repre- sents the non-zero indices in y′ , known as SHAP values.
Unlike traditional methods of interpreting the impor- tance of features in machine learning models, SHAP has the added advantage of determining whether each input feature contributes positively or negatively. A sample of the SHAP visualization for a “fire” predicted image is given in Fig. 5.
Optimization of pre‑trained models for comparison To compare the performance of the proposed Fire- Xnet, five standard pre-trained models, i.e., VGG16,
(2) a′i=0=⇒∅i=0
(3) F′a
y′
−f′a y′\i
≥fa y′
−fa y′\i
(4)
∅i f,a
=
y′⊆a′
y′
!
M−
y′
−1
! M!
fx
y′
−fx
y′\i
InceptionResNetV2, Inception V3, MobileNetV2, and DenseNet201 have been utilized. For a fair compari- son, transfer learning has been applied to the afore- mentioned five pre-trained models to retrain the models on our own dataset and to compare the accu- racy of these models with the proposed model. In addi- tion to applying transfer learning, fine-tuning has also been applied to these models to improve their accu- racy to an optimized level. The architectural details of all five models implemented in this paper are given in Table 3.
Applying transfer learning on pre‑trained models
To train CNN from scratch, a significantly large amount of data is needed, which is not always possible.
However, using a pre-trained model with pre-trained weights can eliminate the need for a large training dataset. Therefore, pre-trained models have been pre- ferred in this study for comparison purposes, which had already been trained on the ImageNet dataset.
ImageNet is a large-scale labeled dataset that contains more than 14 million images spanning over 20,000 classes. The utilized transfer learning approach is given in Fig. 6.
Transfer learning was employed by freezing all pre-existing layers except the final two, which were retrained using our data. We added a layer for batch normalization before the fully connected layer to each model. Following this, a “flattening” process occurred, and a dense layer was added with 128 units using the ReLU activation function. After this, we incorporated
Fig. 5 Sample of SHAP showing the contribution of the features involved in the decision
both batch normalization and a dropout layer with a size of 0.4. Another dense layer was added with a drop- out size of 0.3, followed by a final dense layer with four units using the softmax activation function. This final layer determines the four class outputs. We selected softmax because it ensures that the output probabili- ties are interconnected and always total “1”. Thus, an increase in one class’s output probability will cause a decrease in the other, keeping the total constant at “1”
(Umair et al. 2021).
Results
Performance parameters
The proposed model FireXnet and the five pre-trained models, i.e., VGG16, InceptionResNetV2, InceptionV3, DenseNet201, and MobileNetV2 were evaluated on the prepared dataset. The GPU utilized for training the models is NVIDIA Tesla T4 (16 GB) GPU accessed via Google Colab. The pixel size of all images has been kept the same, i.e., 224 × 224. For all the models, a batch size of 64 was used and the number of epochs was 100. Fur- thermore, a learning rate of 0.00001 was used to train
each model. The performance of all the models has been evaluated by calculating the key performance param- eters given in Eqs. (5, 6, 7 and 8), i.e., accuracy, recall, precision, and the F1-score. The results of the perfor- mance parameters of the proposed FireXnet model and the models used for comparison are given in Table 4.
The values used in Eqs. (5, 6, 7 and 8), i.e., true posi- tive (TP), false positive (FP), true negative (TN), and false negative (FN) were extracted from the confusion (5) Accuracy= TN+TP
TP+TN+FN+FP
(6) Recall= TP
TP+FN
(7) Precision= TP
TP+FP
(8) FI Score=2×Recall×Precision
Recall+Precision Table 3 Architectural parameters of transfer learning models
Models Depth (layers) Parameters (in million) Input layer size Output
layer size
VGG16 23 138.3 (224, 224, 3) (4, 1)
InceptionResNetV2 572 55.8 (224, 244, 3) (4, 1)
InceptionV3 159 23.8 (224, 224, 3) (4, 1)
MobileNetV2 88 3.5 (224, 224, 3) (4, 1)
DenseNet201 201 20.2 (224, 224, 3) (4, 1)
Fig. 6 Summary of the utilized transfer learning approach
matrices of each CNN model. The confusion matrices have been generated for the 380 images initially sepa- rated for testing purposes. The confusion matrices of all the models in this research are given in Fig. 7.
Other salient performance metrics, such as training accuracies and training losses, as well as the validation accuracies and validation losses at different epochs, have also been calculated and drawn to gauge the train- ing performance of all models. The results of the afore- mentioned performance metrics at different numbers of Epochs are given in Table 5, whereas the graphs are presented in Figs. 8 and 9.
It is evident from the graphs given in Figs. 8 and 9 that the Proposed FireXnet has the least training and validation loss and the highest training and validation accuracy among all the models under investigation.
Also, from the results of performance parameters given previously in Table 4, it can be observed that the pro- posed model, FireXnet outperforms all other models in terms of accuracy, recall, and F1-score.
Total parameters and training time
The main objective, as described earlier, was to reduce the total parameters (trainable + non-trainable) of the proposed CNN, which could effectively reduce the train- ing and testing time as well. The total calculated param- eters and training time of all the models including the proposed light-weight model are given in Table 6. It can be observed that the proposed model has the minimum training time with significantly a smaller number of parameters as compared to the other five models. Fur- thermore, a graphical comparison of the total param- eters and the training + testing time is shown in Fig. 10.
SHAP results
The XAI tool used for interpretability in this paper is SHAP.
The combination of SHAP and CNN enhances the model’s interpretability and allows for the identification of key char- acteristics triggering wildfire detections. The visualization
displays the SHAP values, which represent the average marginal contribution of each feature to the prediction across all possible feature combinations. SHAP has the added advantage of determining whether each input feature contributes positively or negatively. The SHAP visualization results of the proposed FireXnet model are given in Fig. 11.
Analysis and discussion
In this paper, we intended to design a deep learning model for wildfire detection that is not only efficient and accurate but also has a lightweight architecture that can easily be deployed on resource-constrained devices, such as drones. To achieve this goal, a sequential con- volutional layer architecture was chosen and tailored in such a way that it had considerably reduced training and non-training parameters and had less training and test- ing time. These objectives were achieved by tailoring the selected architecture. The proposed model was designed to have three sequential convolution blocks with a max pooling layer appended after the last convolution layer in each block. Furthermore, a global average pool- ing layer was added after the third convolution block, it takes the average of each feature map and feeds it to the classifier block. The global average pooling enforces the correspondence between feature maps and catego- ries, due to which, the feature maps can be easily inter- preted as categorized confidence maps. The results of the proposed FireXnet model and all other models that were chosen and optimized for comparison have already been presented in the previous section. FireXnet uses three blocks of convolution layers and is good at learn- ing high-level features. However, this focus might make it less suitable for datasets that need a lot of low-level feature extraction. In this section, we’ll discuss and ana- lyze the various experiments performed on the models for fine-tuning, optimization, and accurate prediction.
Performance based on optimizers
To select and finalize an optimizer that could be applied uniformly across all models in this research, Table 4 Results of performance parameters of all the models
Models Test accuracy (%) Precision (%) Recall (%) F1 score (%)
The proposed FireXnet model 98.42 98.42 98.42 98.42
DenseNet201 97.63 97.63 97.95 97.95
VGG16 94.75 94.75 94.75 94.75
MobileNetV2 98.16 98.16 98.16 98.16
InceptionResNetV2 96.05 96.05 96.05 96.05
InceptionV3 96.84 96.84 96.84 96.84
Fig. 7 The confusion matrices of a the proposed FireXnet model, b Densenet201, c VGG16, d MobileNetV2, e InceptionResNetV2, f InceptionV3
Table 5 Training performance of all models on different numbers of Epochs
Model Epochs Train loss Valid loss Training accuracy Validation accuracy
The proposed FireXnet model 1 1.0284 1.0284 58.73% 65.64%
99 0.0252 0.0474 98.54% 98.31%
100 0.0267 0.0349 98.52% 98.42%
DenseNet-201 1 1.0973 0.5369 58.59% 83.33%
99 0.0416 0.0364 97.24% 97.12%
100 0.0395 0.0495 97.67% 97.63%
VGG16 1 1.4186 1.1477 45.94% 44.30%
99 0.1531 0.1233 95.39% 95.91%
100 0.1480 0.1293 95.65% 96.05%
MobileNetV2 1 1.2676 0.5303 49.99% 80.99%
99 0.0412 0.0839 99.38% 97.51%
100 0.0421 0.0714 99.27% 97.95%
InceptionResNetV2 1 1.2576 0.7130 45.72% 83.92%
99 0.0801 0.0664 97.37% 97.66%
100 0.0992 0.0716 96.89% 97.51%
InceptionV3 1 1.1651 0.5102 52.16% 90.64%
99 0.0790 0.0706 97.59% 97.66%
100 0.0762 0.0719 97.51% 97.51%
(a) (b) (c)
(d) (e) (f)
Fig. 8 Training and validation accuracy graphs of a the proposed FireXnet model, b VGG16, c InceptionResNetv2, d InceptionV3, e MobileNetV2, f DenseNet201
a comprehensive evaluation of four prominent opti- mizers was conducted: RMSProp, SGD, Adam, and Adadelta. The objective was to identify the optimizer that best aligns with the specific demands of our mod- els and enhances their performance. During this opti- mizer selection process, an array of parameters was meticulously investigated. A diverse set of batch sizes, including 16, 32, 64, and 128, was employed, and each optimizer was evaluated over 100 epochs. Multiple learning rates were explored, spanning (0.001, 0.0001, and 0.00001), with careful attention to their impact on convergence and stability. After a thorough analy- sis of loss graphs and performance metrics, it was determined that a learning rate of 0.00001 batch size of 64 and Adam optimizer exhibited the most favora- ble outcomes across the board. Performance results of
the aforementioned optimizers on the proposed Fire- Xnet are given in Table 7. The results indicate that the Adam optimizer outperformed the other three opti- mizers. Therefore, the Adam optimizer was finalized to be applied to all models.
Performance based on reliability parameters
The confidence interval for each model has been calcu- lated to quantify the uncertainty of the estimate and to present the classification skill of all models. Table 8 shows the true and predicted labels derived from the confu- sion matrices, as well as the accuracies and 95% confi- dence intervals (CI) for each model. Moreover, to verify the reliability of all models, the Cohen kappa coefficient (κ) for all models has also been calculated. If the value of “κ” is between 0.61 and 0.80, the results are said to be
(a) (b) (c)
(d) (e) (f)
Fig. 9 Training and validation loss graphs of a the proposed FireXnet model, b VGG16, c InceptionResNetv2, d InceptionV3, e MobileNetV2, f DenseNet201
Table 6 Comparison of parameters and training time of all the models
Models Total parameters Trainable parameters Non‑trainable
parameters Training time
(HH:MM:SS) Model size (MBs)
The proposed FireXnet model 1,774,308 37,988 1,736,320 01:13:10 6.73
DenseNet201 30,369,252 12,046,948 18,322,304 02:27:35 115.22
VGG16 17,930,980 3,215,972 14,715,008 01:49:37 67.95
MobileNetV2 10,291,172 8,032,868 2,258,304 02:10:35 39.12
InceptionResNetV2 59,256,324 4,919,588 54,336,736 01:49:26 224.92
InceptionV3 28,360,772 6,557,988 21,802,784 01:59:46 107.68
in considerable accordance with the given data. While a kappa value between 0.81 and 1.00 reflects the nearly per- fect agreement of results (McHugh 2012). A confidence interval of [0.97, 1.00] validates the certainty of the pro- posed FireXnet model’s estimate and a kappa coefficient of 0.98 proves that the results of the proposed model are in considerable accordance with the given data.
Performance based on prediction
The proposed FireXnet model accurately predicted the test images. Figure 12 shows the accurately predicted
Fire, No Fire, Smoke, and Thermal Fire cases for the test images. The correct prediction of wildfire, No Fire, Smoke, and Thermal Fire indicates the efficacy of all the proposed models.
Conclusion
Deep learning models pose challenges of high compu- tational complexity and inference time when deployed on resource-constrained devices used in wildfire detec- tion. To address the latency issues and to create a prac- tically deployable model, a tailored lightweight deep learning model named FireXnet has been proposed and Fig. 10 a Trainable and non-trainable parameters of all the models. b Training and testing time of all models
Fig. 11 SHAP visualization results of the proposed FireXnet model a Fire Class, b No Fire Class, c Smoke Class, and d Thermal Fire Class
evaluated in this paper. FireXnet combines the robust fea- ture extraction of CNN with the transparency of explain- able AI using SHAP (SHapley Additive Explanations).
This integration enhances the model’s interpretability and allows for the identification of key characteristics trig- gering wildfire detections. In addition to being accurate, FireXnet has reduced computational complexity due to considerably fewer training and non-training parameters
and significantly fewer training and testing times. The proposed model has also been compared with five pre- trained models, i.e., InceptionResNetV2, InceptionV3, DenseNet201, VGG16, and MobileNetV2. For a fair comparison, these models were retrained using transfer learning on the same dataset that was used to train the proposed model. The FireXnet model achieved an accu- racy of 98.42%, which is greater than all other models used Table 7 Classification performance comparison among different optimizers
Models Optimizer Accuracy (%) Precision (%) Recall (%) F1‑score (%)
The Proposed FireXnet SGD 96.25 96.25 96.25 96.25
Adadelta 97.16 97.16 97.16 97.16
RMSProp 97.46 97.46 97.46 97.46
Adam 98.42 98.42 98.42 98.42
Table 8 Confidence intervals and Cohen’s kappa coefficient for all models
Models True labels Predicted labels Accuracy (%) 95% CI Kappa
Fire No Fire Smoke Thermal
The proposed FireXnet model Fire 93 0 2 0 98.42 [0.97, 1.00] 0.98
No Fire 1 92 2 0
Smoke 0 1 94 0
Thermal 0 0 0 95
DenseNet201 Fire 90 0 5 0 97.63 [0.96, 0.99] 0.97
No Fire 1 93 1 0
Smoke 2 0 93 0
Thermal 0 0 0 95
VGG16 Fire 90 3 2 0 94.75 [0.92, 0.96] 0.95
No Fire 2 89 4 0
Smoke 3 6 86 0
Thermal 0 0 0 95
MobilenetV2 Fire 91 0 4 0 98.16 [0.97, 1.00] 0.98
No Fire 1 92 2 0
Smoke 0 0 95 0
Thermal 0 0 0 95
Inception-ResNetV2 Fire 92 1 2 0 96.05 [0.95, 0.98] 0.96
No Fire 2 92 1 0
Smoke 4 5 86 0
Thermal 0 0 0 95
InceptionV3 Fire 92 1 2 0 96.84 [0.95, 0.98] 0.97
No Fire 1 88 6 0
Smoke 2 0 93 0
Thermal 0 0 0 95
for comparison. To confirm the reliability of the proposed model, the reliability parameters like confidence intervals and Cohen’s kappa coefficient have also been calculated.
A confidence interval of [0.97, 1.00] validates the certainty of the proposed model’s estimate and a kappa coefficient of 0.98 proves that the results of the proposed model are in considerable accordance with the given data.
Acknowledgements
The authors are thankful to the Deanship of Scientific Research at Najran University for funding this work. The author would like to thank Prince Sultan University for their support.
Authors’ contributions
All authors contributed to the development of this manuscript. The authors read and approved the final manuscript.
Funding
This research is funded by the Deanship of Scientific Research at Najran University under the Research Groups Funding program grant code (NU/RG/SERC/12/28).
Availability of data and materials
The pre-processed dataset used in this manuscript is available from the cor- responding author on reasonable request.
Declarations
Ethics approval and consent to participate Not applicable.
Consent for publication Not applicable.
Competing interests
The authors declare that they have no competing interests.
Author details
1 Faculty of Engineering and Technology, Multimedia University, Melaka, Bukit Beruang 75450, Malaysia. 2 School of Computing, Engineering and the Built Environment, Edinburgh Napier University, Edinburgh EH10 5DT, UK.
3 Department of Cyber Security, Pakistan Navy Engineering College, NUST, Karachi 75350, Pakistan. 4 Robotics and Internet of Things Laboratory, Prince Sultan University, Riyadh 12435, Saudi Arabia. 5 RIADI Laboratory, National School of Computer Sciences, University of Manouba, Manouba 2010, Tunisia.
6 Department of Computer Science, College of Computer Science and Infor- mation Systems, Najran University, Najran 61441, Saudi Arabia. 7 Depart- ment of Software Engineering, Computer Science, Al Ain University, Abu Dhabi 1555, United Arab Emirates.
Received: 1 July 2023 Accepted: 30 August 2023
References
Akhloufi, M.A., R.B. Tokime, and H. Elassady. 2018. Wildland fires detection and segmentation using deep learning. In Proceedings Volume 10649, Pattern Recognition and Tracking XXIX, 106490B. Orlando: SPIE.
Ben Atitallah, S., M. Driss, W. Boulila, and H. Ben Ghezala. 2022. Randomly initialized convolutional neural network for the recognition of COVID-19 using X-ray images. International Journal of Imaging Systems and Technol- ogy 32 (1): 55–73.
Fig. 12 Accurately predicted images by the FireXnet model: a Fire, b No Fire, c Smoke, and d Thermal Fire