Predicting Energy Consumption in Wastewater Treatment Plants through Light Gradient Boosting Machine: A Comparative Study

(1)

Predicting Energy Consumption in Wastewater Treatment Plants through Light Gradient

Boosting Machine: A Comparative Study

Item Type Conference Paper

Authors Alali, Yasminah H;Harrou, Fouzi;Sun, Ying

Citation Alali, Y., Harrou, F., & Sun, Y. (2022). Predicting Energy Consumption in Wastewater Treatment Plants through Light Gradient Boosting Machine: A Comparative Study. 2022 10th International Conference on Systems and Control (ICSC). https://

doi.org/10.1109/icsc57768.2022.9993872 Eprint version Post-print

DOI 10.1109/ICSC57768.2022.9993872

Publisher IEEE

Rights This is an accepted manuscript version of a paper before final publisher editing and formatting. Archived with thanks to IEEE.

Download date 2024-01-09 22:40:53

Link to Item http://hdl.handle.net/10754/686762

(2)

Predicting Energy Consumption in Wastewater Treatment Plants through Light Gradient Boosting Machine: A Comparative Study

Yasminah Alali¹, Fouzi Harrou¹ and Ying Sun¹

Abstract— Water quality and availability worldwide are greatly affected by climate changes due to global warming.

For instance, recently, the water levels in numerous Euro- pean rivers decreased compared to their levels in centuries.

Treated and desalinated water represents a promising strate- gic option to mitigate water scarcity. The global cost of wastewater treatment plants (WWTPs) depends significantly on their energy consumption. A precise prediction of energy consumption of WWTPs can comprehend and predict the plant behavior to support process design and monitoring and enhance optimization of overall performances. This paper presents a practical machine-learning approach to predict WWTP energy consumption through Light Gradient Boosting Machine (LightGBM) approach. Data from a full-scale WWTP at Melbourne eastern over five years is employed in this study. Results indicate that the LightGBM is more accurate than the other fourteen machine commonly used machine- learning models. In addition, results revealed that including lagged measurements in constructing the investigated models improves the prediction accuracy. This study shows that the dynamic, optimized LightGBM model outperformed all models with reasonable root-mean-square error and mean absolute error of 37.38 and 28.63, respectively.

I. INTRODUCTION

In recent years, due to recent social and economic de- velopments in the world, wastewater production has in- creased proportionally, putting growing pressure on wastewater treatment plants (WWTPs). Notably, the treated water from WWTPs is a promising option to mitigate the water scarcity threat, which can be used for different purposes, such as in irrigation, aquarium, or discharged with a low level of pollution [1]. Meanwhile, WWTPs must meet high pollution standards following sustainable development [2], [3]. Many WWTPs use excessive chemicals during the treatment process, resulting in unnecessary waste of energy and materials [4], [5]. To some, such high energy and material consumption contradicts the idea of sustainability. Therefore, it is crucial to reasonably schedule energy consumption based on the discharge standard.

Water treatment plants are complex processes that remove pollutants from wastewater through microbiological reac- tions [6]. A considerable amount of energy is expended in this energy-intensive process to reduce pollution and con- serve energy. With Artificial Intelligence (AI), it is possible to balance these two divergent goals. Artificial Intelligence

*This work was not supported by any organization

1Yasminah Alali, Fouzi Harrou and Ying Sun are with King Abdullah University of Science and Technology (KAUST) Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal, 23955-6900, Saudi Arabia[email protected]

has increasingly been integrated into wastewater treatment processes over the past two decades. Designing an efficient biological treatment process is intrinsically challenging be- cause it involves combining various types of microbial com- munities, natural environments, and influent characteristics, and each has its own inherent uncertainties. While governing legislation calls for consistent effluent standards, financial constraints demand energy conservation. Considering the time-varying and uncertain nature of the effluent process, advanced control solutions, such as Artificial Intelligence, are necessary to reconcile effluent quality and efficient operation.

It is a powerful tool that will serve many of the previously defined operational requirements for the process [7].

Machine learning has recently been explored to control WWTPs by predicting how much energy they consume [8], [9]. In [10], an approach using a Random Forest (RF) model is adopted for predicting WWTPs’ energy consumption. Data used for evaluation is taken from the China Urban Drainage Yearbook, which contains 2387 records. Results indicate that the RF model performs reasonably well with an R2 of 0.702. Although this was the case, local climate and technology were not considered in this study. In [11], various factors, including wastewater, were examined to determine their effect on energy consumption based on Melbourne water company data spanning 2014 through 2019. Different prediction models have been examined, including Artificial Neural Network (ANN), Gradient Boosting Machine (GBU), and RF. Results demonstrated that in terms of the Root mean squared error (RMSE) and Mean Absolute Error (MAE) values, the GBU had the lowest values in the test phase, 33.9 and 26.9, respectively. In [12], logistic regression was employed to estimate the energy consumption of a WWTP.

The effectiveness of this approach has been assessed using data with 403 records collected from Romania between 2015 and 2017. It has been shown that the logistic regression demonstrated high accuracy of 80%. However, the model was trained without considering all parameters that may affect water quality.

Recently, in another study, ANN, K-Nearest Neighbor (KNN), SVM, and linear regression models have been used to predict the approximate energy consumption of WWTPs to reduce their costs [13]. Without an electrical meter for monitoring, energy consumption has been computed using the Tenaga National Berhad (TNB) electrical bills from March 2011 to February 2015. The comparative results revealed that the ANN model produces the lowest error, with its RMSE of 52084. In [14], machine learning methods have

(3)

been used to predict energy consumption in 317 WWTPs in northwest Europe, mainly in the Netherlands, France, Denmark, Belgium, Germany, Austria, and Luxembourg.

Specifically, a comparison was made between ANN and RF models based on average determination coefficient (R²), where the RF obtained 0.82 and ANN reached 0.81, proving that Random Forest outperformed ANN. In [15], the aim is to perform daily benchmark analysis in order to save energy in WWTPs. Using the Solingen-Burg dataset, which includes 120,000 individuals, the researchers evaluated Sup- port Vector Regression (SVR), Artificial Neural Networks (ANNs), and Random Forest algorithms. As a result of the validation and testing, the Random Forest was found to be the most efficient algorithm, with an R2 of 0.71 in the test phase. In [16], conventional activated sludge WWTPs were optimized using neural networks to reduce their energy consumption. 317 WWTPs in northwest Europe were used to train this technique. Based on the results, ANN got an R² between 74.2 and 82.4 during testing. In contrast, neither the time of aeration nor the total power consumed by the aerators was measured. In [17], the paper investigated the use of ANN models to optimize the energy consumption of WWTP.

Approximately 1400 instances of data were used, and ANN has an MAE of 11.668 and a MAPE of 0.0270, respectively.

Additionally, in [18], ANNs are used to optimize WWTP energy consumption. The data set is based on a period from July 2010 to January 2012. In terms of MAE and MAPE, the ANN achieves 0.78 and 0.02, respectively. While training this model, the researchers did not take the whole WWTP parameters into account.

This study aims to predict the energy consumption of a WWTP based on an enhanced light gradient boosting machine (LightGBM) approach. In other words, the aim is to explore the prediction capacity of the optimized LightGBM model in predicting energy consumption. This choice is mainly motivated by LightGBM’s ability to speed up the training step and achieve higher accuracy than other gradient- boosting frameworks. Moreover, this method is referred to as ”LightGBM” due to its speedy training process and low memory usage [19]. Here, the Bayesian Optimization algorithm is employed to optimally calibrate the LightGBM model. To show the performance of the LightGBM in predicting energy consumption, we conducted a comparison study with 13 machine-learning models, including support vector regression (SVR) with various kernels, bagged trees (BT), boosting trees (BST), decision trees (DT), Random Forests (RF), and Gaussian Process Regression (GPR).

We used 5-fold cross-validation to train these models and adopted Bayesian Optimization in training for hyperparameter tuning. Real data from a WWTP at Melbourne eastern over five years is employed in this study. Results highlight that the LightGBM approach outperformed the other models.

It is important to note that machine-learning models do not consider the time-dependent nature of energy consumption.

This study addressed this limitation by including lagged measurements to increase the machine learning models’

ability to perform effectively.

The remainder of this study is organized as follows.

A brief description of the WWTP datasets is provided in Section II. Followed by a brief description of the used model and Bayesian Optimization algorithm used in the analysis in Section III. In Section IV, we present the results and discuss the comparisons among the models. The conclusions are outlined in Section V.

II. DATA DESCRIPTION

Experimental results in this study are conducted based on multivariate data with 1382 records gathered between January 2014 and June 2019 from the Mel- bourne water treatment plant and airport weather stations (https://data.mendeley.com/datasets/pprkvz3vbd/1). This data comprises power consumption, biological, hydraulics, and climate variables. Water quality and biological characteristics were measured by sensors. Daily reports were produced based on the revenue quality meters’ power consumption data. Moreover, the Melbourne airport weather station provides meteorological and temperature reports. Due to the absence of sampling during holidays and weekends, there are a few records in biology with Null values. Furthermore, we eliminated data points with extremely low or highly high- power consumption, which were considered outliers. There were about 5% of the data points removed [1].

Figure 1 displays the boxplots of yearly WWTP power consumption during the studied period from 2014 to 2019.

It can be seen that in 2019 the WWTP exhibited enhanced performance with reduced energy consumption compared to the last four years. From Figure 1, we observe that the annual distribution of energy consumption in 2019 has decreased in both average values and standard deviations.

Fig. 1. Distribution of yearly energy consumption in the studied WWTP.

Figure 2 illustrates the boxplots of monthly energy consumption at the WWTP during the considered period of data. We observe that the power consumption shows a larger variance in the hot period of October, November, and December. This could be due to the high consumption of water during the hottest period as well as due to tourism.

Indeed, in Australia, the hottest period of the year usually occurs in these three months.

(4)

Fig. 2. Distribution of monthly energy consumption of the studied WWTP.

III. METHODOLOGY

A. LightGBM model

LightGBM is Microsoft’s open-source Gradient Boosting Decision Tree (GBDT) framework. The distinction of Light- GBM is that it vertically grows in its tress, called a tree leaf-wise, while other algorithms grow horizontally level- wise. The LightGBM model has two main improvements:

the histogram algorithm and the leaf-wise strategy with depth limitation. Histogram algorithms divide continuous data into K integers and create a histogram with K widths. It is then used to find the optimal split point of the decision tree using the discretized values accumulated in this histogram. The Leaf-wise strategy in LightGBM will choose the leaf by considering max delta loss to grow. In case when growing the same leaf, a Leaf-wise algorithm can reduce more loss than a level-wise algorithm [20]. During the Leaf-wise strategy, the depth and number of leaves are limited, which helps reduce the model’s complexity and prevent overfitting. The following formula of the LightGBM model is:

F(x) =

M

X

m=1

f_m(x), (1)

whereF(x)is the final output, andf_m(x)is the output from the weak regression tree m^th.

Over the past few years, the LightGBM model has proven to be highly efficient in practical applications. A number of aspects of LightGBM make it better than other Boost models. LightGBM is a histogram-based algorithm that buck- ets values, which enables faster and more accurate training and requires less memory. It also works well with large and complex datasets since it is sensitive to overfitting and can overfit small datasets easily. Also, a parallel learning model and a GPU learning model have been supported in LightGBM [21].

B. Bayesian Optimization

The process of tuning hyperparameters is time-consuming and computationally expensive when building a machine learning model. The optimal hyperparameters should be found rapidly by a tuning mechanism. There are several methods to implement hyperparameter tuning, including grid

search, random search, and hyperband. All of these techniques have their own advantages and disadvantages [22].

Here’s where Bayesian optimization comes in provides a refreshing change from other tuning techniques. In Bayesian optimization, all previous estimates of the functionf(x)are used instead of local gradients to obtain the most accurate estimates [23]. Bayesian optimization (BO) was used in this study to analyze the optimal parameters of the studied methods. Table I shows the fourteen prediction methods considered to predict WWPT energy consumption: SVR methods, GPR methods, BT and BS ensemble learning techniques, and RF, DT, and LightGBM.

TABLE I

MACHINE LEARNING METHODS INVESTIGATED IN THIS STUDY.

C. Evaluation metrics

As part of this study, we assessed the accuracy of the forecasting models using three metrics: root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE).

RM SE= v u u t 1 n

n

X

t=1

(yt−ybt)², (2)

M AE= Pn

t=1|yt−ybt|

n , (3)

M AP E=100 n

n

X

t=1

yt−ybt

y_t

%, (4)

whereyt is the actual energy consumption, ybt is its corre- sponding predicted energy consumption, andnis the number of records. Better precision and prediction quality would be implied by lower RMSE, MAE, and MAPE values.

(5)

D. Prediction framework

This study compares the prediction accuracy and efficiency of fourteen machine-learning models to predict the WWTP’s energy consumption. The whole process in our experiment is shown in Figure 3. At first, the data was preprocessed by eliminating outliers and imputing missing values. Then, the wastewater treatment plants’ data are subdivided into training and test sets. We train the fourteen machine learning algorithms using the training set, then we evaluate them using the testing set. The models are calibrated using the Bayesian optimization during training stage. Three statistical criteria have been used to determine the best forecasting model:

RMSE, MAE, and MAPE.

Fig. 3. Illustration of the adopted prediction framework.

IV. RESULTS AND DISCUSSION

In this study, the data were first split into 75% for training and 25% for testing data. The models have been trained using data collected between January 1, 2014, and January 28, 2018. We performed the testing using data between January 29, 2018, and June 27, 2019. Fourteen models are constructed from training data and used to predict energy

consumption. Through 5-fold cross-validation, we built the investigated models based on the training data. The models have been calibrated using the Bayesian optimization. Ta- ble II summarizes the hyperparameter values computed by Bayesian optimization algorithms.

TABLE II

SEARCH RANGE OF HYPERPARAMETERS.

Machine learning-based prediction methods have usually been developed without considering information from past data, making them difficult to capture data dynamics. Hence, we consider time-lagged energy consumption data in this experiment when predicting energy consumption. In other words, this experiment aims to investigate the impact of considering lagged data on energy consumption prediction accuracy. To this end, we consider adding the Lag 1 energy consumption data in the energy consumption prediction. In addition, we consider only the most important variables as inputs into the machine-learning methods to reduce the computation and get parsimonious models. Specifically, we adopt the random forest method to select a subgroup of features to reduce the heavy computation. Indeed, non-informative and redundant input variables will be ignored in building a predictive model to reduce the number of input variables.

The results of the selection based on the RF algorithm are shown in Figure 4. Five features are considered the time- lagged energy consumption at lag 1, Average Inflow, Average Outflow, Average Temperature, and Average humidity, which are the most important features for energy consumption prediction (Figure 4). We observe that the time-lagged energy consumption significantly impacts the prediction result.

Table III provides the prediction results of the investigated machine learning models using testing data. Results indicate the promising performance of the optimized machine

(6)

Fig. 4. Selecting important variable using RF algorithm.

learning methods in predicting energy consumption. Also, the incorporation of time-lagged data enables capturing dynamics in data. Based on the comparison of the models in Table III, LightGBM had the lowest RMSE and MAE errors, with 37.38 and 28.63, respectively. The next model was GPR, with RMSE values ranging from 37.36 to 37.45 and MAPE values ranging from 10.02 to 10.12. Decision Trees are also generally the fastest of all models regarding training time.

V. CONCLUSIONS

Accurate energy consumption prediction is crucial in optimizing and managing WWTPs. This work aims to develop data-driven models to predict the energy consumption of WWTPs. Here, we explored and compared fourteen machine-learning methods to predict energy consumption in a WWTP at Melbourne. The methods investigated in this work include SVR, GPR, RF, Boosted trees, bagged trees, and LightGBM. More than five years of multivariate data collected from the Melbourne water treatment plant and weather station were used to investigate the effectiveness of the studied machine learning methods. We employed Bayesian optimization to calibrate the models and important feature selection to build reduced and parsimonious predictive models. Results revealed that the dynamic optimized LightGBM model achieved the lowest RMSE and MAE errors, with 37.38 and 28.63, respectively. This confirms that the LightGBM can achieve higher performance than other gradient-boosting frameworks.

Despite the satisfactory prediction performance obtained using the LightGBM model, we plan in future work to develop and evaluate deep learning models, like RNNs [24], for energy consumption in WWTPs and water desalination plants. Another improvement line consists of incorporating the attention mechanism [25] within machine learning models to select only the relevant feature and get efficient models.

REFERENCES

[1] Y. Gu, Y. Li, X. Li, P. Luo, H. Wang, X. Wang, J. Wu, and F. Li,

“Energy self-sufficient wastewater treatment plants: feasibilities and challenges,”Energy Procedia, vol. 105, pp. 3741–3751, 2017.

[2] K. Smith and S. Liu, “Energy for conventional water supply and wastewater treatment in urban china: a review,” Global Challenges, vol. 1, no. 5, p. 1600016, 2017.

TABLE III

PREDICTION RESULTS OF MACHINE LEARNING MODELS USING TESTING DATA.

[3] Y. Qiu, H.-c. Shi, and M. He, “Nitrogen and phosphorous removal in municipal wastewater treatment plants in china: a review,”Interna- tional Journal of Chemical Engineering, vol. 2010, 2010.

[4] D. Panepinto, S. Fiore, M. Zappone, G. Genon, and L. Meucci,

“Evaluation of the energy efficiency of a large wastewater treatment plant in italy,”Applied Energy, vol. 161, pp. 404–411, 2016.

[5] O. Nowak, P. Enderle, and P. Varbanov, “Ways to optimize the energy balance of municipal wastewater systems: lessons learned from austrian applications,” Journal of Cleaner Production, vol. 88, pp.

(7)

125–131, 2015.

[6] K. Mizuta and M. Shimada, “Benchmarking energy consumption in municipal wastewater treatment plants in japan,” Water Science and Technology, vol. 62, no. 10, pp. 2256–2262, 2010.

[7] J. Wang, K. Wan, X. Gao, X. Cheng, Y. Shen, Z. Wen, U. Tariq, and M. J. Piran, “Energy and materials-saving management via deep learning for wastewater treatment plants,” IEEE Access, vol. 8, pp.

191 694–191 705, 2020.

[8] F. Harrou, T. Cheng, Y. Sun, T. Leiknes, and N. Ghaffour, “A data- driven soft sensor to forecast energy consumption in wastewater treatment plants: A case study,”IEEE Sensors Journal, vol. 21, no. 4, pp. 4908–4917, 2020.

[9] T. Cheng, F. Harrou, F. Kadri, Y. Sun, and T. Leiknes, “Forecasting of wastewater treatment plant key features using deep learning-based models: A case study,” IEEE Access, vol. 8, pp. 184 475–184 485, 2020.

[10] S. Zhang, H. Wang, and A. A. Keller, “Novel machine learning- based energy consumption model of wastewater treatment plants,”ACS ES&T Water, vol. 1, no. 12, pp. 2531–2540, 2021.

[11] M. S. Zaghloul and G. Achari, “Application of machine learning techniques to model a full-scale wastewater treatment plant with biological nutrient removal,” Journal of Environmental Chemical Engineering, vol. 10, no. 3, p. 107430, 2022.

[12] C. Boncescu, L. Robescu, D. Bondrea, and M. M˘acinic, “Study of energy consumption in a wastewater treatment plant using logistic regression,” in IOP Conference Series: Earth and Environmental Science, vol. 664, no. 1. IOP Publishing, 2021, p. 012054.

[13] N. A. Ramli and M. F. Abdul Hamid, “Data based modeling of a wastewater treatment plant by using machine learning methods,” vol. 6, pp. 14–21, 05 2019.

[14] D. Torregrossa, U. Leopold, F. Hern´andez-Sancho, and J. Hansen,

“Machine learning for energy cost modelling in wastewater treatment plants,” Journal of environmental management, vol. 223, pp. 1061–

1067, 2018.

[15] D. Torregrossa, G. Schutz, A. Cornelissen, F. Hern´andez-Sancho, and J. Hansen, “Energy saving in wwtp: daily benchmarking under

uncertainty and data availability limitations,”Environmental research, vol. 148, pp. 330–337, 2016.

[16] R. Oulebsir, A. Lefkir, A. Safri, and A. Bermad, “Optimization of the energy consumption in activated sludge process using deep learning selective modeling,” Biomass and Bioenergy, vol. 132, p. 105420, 2020.

[17] Z. Zhang, A. Kusiak, Y. Zeng, and X. Wei, “Modeling and optimization of a wastewater pumping system with data-mining methods,”

Applied energy, vol. 164, pp. 303–311, 2016.

[18] Z. Zhang, Y. Zeng, and A. Kusiak, “Minimizing pump energy in a wastewater processing plant,” Energy, vol. 47, no. 1, pp. 505–514, 2012.

[19] R. Zhao, D. Wei, Y. Ran, G. Zhou, Y. Jia, S. Zhu, and Y. He, “Building cooling load prediction based on lightgbm,” IFAC-PapersOnLine, vol. 55, no. 11, pp. 114–119, 2022.

[20] H. Deng, F. Yan, H. Wang, L. Fang, Z. Zhou, F. Zhang, C. Xu, and H. Jiang, “Electricity price prediction based on lstm and lightgbm,” in 2021 IEEE 4th International Conference on Electronics and Commu- nication Engineering (ICECE). IEEE, 2021, pp. 286–290.

[21] L. Yang and A. Shami, “On hyperparameter optimization of machine learning algorithms: Theory and practice,”Neurocomputing, vol. 415, pp. 295–316, 2020.

[22] Y. Alali, F. Harrou, and Y. Sun, “Optimized gaussian process regression by bayesian optimization to forecast covid-19 spread in india and brazil: A comparative study,” in2021 International Conference on ICT for Smart Society (ICISS). IEEE, 2021, pp. 1–6.

[23] S. Arslan, “A hybrid forecasting model using lstm and prophet for energy consumption with decomposition of time series data,”PeerJ Computer Science, vol. 8, p. e1001, 2022.

[24] F. Harrou, Y. Sun, A. S. Hering, M. Madakyaruet al., Statistical process monitoring using advanced data-driven and deep learning approaches: theory and practical applications. Elsevier, 2020.

[25] A. Dairi, F. Harrou, S. Khadraoui, and Y. Sun, “Integrated multiple directed attention-based deep learning for improved air pollution forecasting,”IEEE Transactions on Instrumentation and Measurement, vol. 70, pp. 1–15, 2021.