View of Distance Correlation-Based Regression Tree Algorithm For Structural Damage Detection

(1)

440 Jurnal Teknik Informatika dan Sistem Informasi ISSN 2407-4322 Vol. 10, No. 2, Juni 2023, Hal. 440-455 E-ISSN 2503-2933

http://jurnal.mdp.ac.id jatisi@mdp.ac.id

Distance Correlation-Based Regression Tree Algorithm For Structural Damage Detection

Jimmy Tjen^*, Genrawan Hoendarto, Tony Darmanto, Thommy Willay Informatics department, Universitas Widya Dharma, Pontianak, Indonesia, 78117.

Jimmy.tjen@mathmods.eu^*, genrawan@widyadharma.ac.id, tdarmanto1289@gmail.com, w.thommy@gmail.com

Abstrak

Penelitian ini mengagas sebuah algoritma baru terkait dengan pendeteksian kerusakan struktur berbasiskan pada algoritma Pohon Regresi (PR) dari pembelajaran pohon keputusan dan korelasi jarak, yang merupakan bentuk nonlinear dari korelasi Pearson, untuk mengurangi jumlah sensor yang dibutuhkan dari sebuah struktur tanpa mengurangi performanya. Simulasi numerik terhadap algoritma yang digagas pada sebuah himpunan data yang disediakan oleh Los Alamos National Laboratory (LANL) menunjukan bahwa algoritma yang digagas memiliki akurasi model yang setara dengan metode PR, namun membutuhkan informasi sensor yang lebih sedikit (dalam kasus ini hanya dibutuhkan 5 dari 24 sensor yang ada), dan lebih stabil dalam memprediksi kerusakan struktur dengan tingkat positif dan negatif palsu dibawah 15%.

Lebih lanjut, algoritma yang digagas dapat diselesaikan dalam kurun waktu 4 kali lebih cepat jika dibandingkan dengan metode PR secara umum, berdasarkan pada simulasi yang telah dilakukan terhadap himpunan data dengan 4.096 sampel pada mesin 8-core, dan 16 GB RAM.

Pada penerapan dalam dunia nyata, algoritma ini dapat digunakan untuk menentukan posisi pemasangan dari sensor di dalam bangunan, sehingga pengguna dapat memonitor keadaan struktur bangunan secara presisi namun dengan biaya pemeliharan gedung yang lebih kecil.

Kata kunci—Monitoring Kesehatan Struktur, Pembelajaran Mesin, Pohon Regresi, Korelasi Jarak

Abstract

This paper proposes a novel idea of a fault detection algorithm based on the Regression Tree (RT) algorithm from the decision tree learning and the distance correlation, which is the nonlinear version of Pearson’s correlation, to reduce the number of sensors without significantly decreasing the model predictive accuracy and the fault diagnosis capability. A numerical validation on an experimental dataset provided by the Los Alamos National Laboratory (LANL) with MATLAB software shows that the proposed algorithm has a comparable model predictive accuracy to the classical RT while requiring a smaller number of sensors (5 instead of 24) and more robust in detecting faults with false negative and positive rates < 15%. Furthermore, we demonstrate that our proposed algorithm runs about 4 times faster than the classical RT on an experimental dataset with 4096 samples on an 8-core, 16 GB RAM machine. In a real-life setup, the proposed algorithm can be used to provide a sensor installment plan on a structure. Such that, the user can still monitor the presence of a fault inside a building precisely, but with a cheaper maintenance cost.

Keywords— Structural Health Monitoring, Machine Learning, Regression Tree, Distance Correlation

(2)

Jatisi ISSN 2407-4322

Vol. 10, No. 2, Juni 2023, Hal. 440-455 E- ISSN 2503-2933 441

Jimmy, et., al [Distance Correlation-Based Regression Tree Algorithm For Structural Damage Detection]

Title of manuscript is short and clear, implies research results (First Author) INTRODUCTION

1.

With the recent advancement of the industrial revolution, it is normal these days for a man-made structure to be equipped with sensors (accelerometers), which are able to measure various kinds of physical properties. This information is useful, i.e., a mathematical model can be derived from it, in order to perform a check on the structure quality. This kind of approach is called the data-driven approach [1] and has been used widely as an alternative to the model- based approach, which is usually hard to derive, especially when the structure is too complex to be modeled [2], [3]. There are a lot of algorithms that utilize this approach. One among them is the Regression Tree (RT) from the Classification and Regression Tree (CART) algorithm.

The RT is a regression method which creates partitions of data and represents it in a tree-like structure, so-called the binary decision tree [4]. Each partition (or the leaf element) of the binary tree contains samples of data which are analogous to each other (i.e., measured at the same temperature, or belonging to the data observed on the same day or time, etc.). Where the prediction of this algorithm is given as the average of samples contained in the corresponding leaf. [5]. This algorithm has been used widely in the field of structural monitoring, e.g., works done in [6-10].

Related works: The authors in [6] presented a damage detection algorithm based on the Random Forest (RF) algorithm, which is a collection of RT. They have shown that the proposed algorithm was able to accurately model the seismic-induced vibration in a structure. In [7], the authors presented an entropy-based sensor selection (e-ss) RT algorithm, which was able to detect faults efficiently, yet required less number of accelerometers (5 instead of 24) than the classical RT method. In [8], the authors have demonstrated that it is possible to predict the energy consumption in a building with RT and poly-exponential models. In which, the proposed models were able to precisely model the energy consumption with 96% accuracy. In [9], the authors were able to show that the ensemble CART algorithm was able to provide an accurate and robust prediction in modeling the compressive strength of Manufactured Sand (MS). The author in [10] has shown that RT and particle swarm optimization can be used to model the dynamics of piezoelectric sensors, with an accuracy of 98%.

Even though the RT algorithm can precisely model various kinds of features, as shown above, it does have some flaws. In particular, the RT algorithm is considered to be computational heavy [11], with the order of time complexity in the ⋅ , where is the number of samples, and is the number of features. Thus, the bigger the dataset, the longer the time needed to complete a single run of this algorithm. Hence, in order to run this algorithm efficiently, it is necessary to drop features or parameters that are “useless” for the model.

To tackle this issue, we propose an idea of a new RT algorithm based on the concept of distance correlation from the probability theory, to reduce the number of features (or sensors in this context) without significantly reducing the predictive model accuracy and the fault detection capability. We validate our methodology on a dataset provided by Los Alamos National Laboratory (LANL) with the MATLAB software and will show that the proposed algorithm performs as good as the classical RT algorithm (or even better as happened in our experimental setup) while requiring a significantly smaller number of samples.

This paper is organized as follows: In Section 2, we discuss the basic notions of distance correlation, the regression tree and the fault detection algorithm proposed in [12] and [7]. Section 3 will focus on how to perform fault detection based on the distance correlation- based regression tree and the data structure for the experimental setup provided in the LANL online data repository. Section 4 shows the numerical validation of the experimental dataset

(3)

442 Jatisi ISSN 2407-4322 Vol. 10, No. 2, Juni 2023, Hal. 440-455 E-ISSN 2503-2933

IJCCS Vol. x, No

with MATLAB software. And finally, in the last section, we will conclude the result of our research and the possible direction for similar research.

THEORETICAL BACKGROUND 2.

In this section, we will briefly discuss the idea behind the distance correlation, which is the nonlinear version of Pearson’s correlation [13]. Furthermore, we will present the notion of the regression tree, which has been a stable method used in big data analysis (e.g., for the regression process)

2.1 Distance Correlation

In the field of statistics and probability theory, the distance correlation is a measure of independence between 2 random vectors (possibly with arbitrary dimensions). Contrary to Pearson’s correlation which measures the linear correlation of a pair of random variables, the distance correlation is 0, if and only if a pair of variables are independent (e.g. not related) to each other [14].

Let … ; ∈ be a dataset containing a vector of measurements, 1 2 … ^⊺; ∈ , where denotes the measurement of -th variable at instance and 1, 2, … , . Suppose that we are interested in understanding how variables "

and # (i.e., $, %∈ ) are related to each other via the sample distance correlation. With abuse of notation, let

& '

"(,( "(, ⋯ "(,

",( ", ⋯ ",

⋮ ⋮ ⋱ ⋮

",( ", ⋯ ",

, , #1

be the distance matrix of ", where ",. ‖0 1 02‖ and ‖⋅‖ denotes the 3 norm or the Euclidean distance [15]. Furthermore, let

4 '

5(,( 5(, ⋯ 5(,

5,( 5, ⋯ 5,

⋮ ⋮ ⋱ ⋮

5,( 5, ⋯ 5,

, , #2

be the matrix of double distance, where each element of matrix 4 is defined as 5,. ",.1 "6,⋅1 "6⋅,.7 "6⋅,⋅;

"6,⋅ 1

8 "^,9

9:(

;

"6⋅,. 1

8 "^9,.

9:(

;

"6⋅,⋅ 1

8 8 "9,;

;:(

;

9:(

#3

Where: "6,⋅"6⋅,.and "6⋅,⋅ denote the mean of -th row, 2-th column and the grand mean of matrix &, respectively [16]. By following (1) to (3), it is also possible to derive the distance matrix and the double-centered matrix for #. In this case, we denote = and > as the distance and double distance matrix of #, with #,. and ?,. be the -th row and 2-th column element of the

(4)

Jatisi ISSN 2407-4322

Vol. 10, No. 2, Juni 2023, Hal. 440-455 E- ISSN 2503-2933 443

Title of manuscript is short and clear, implies research results (First Author) respective matrices. Given the definition above, the sample distance correlation of " and #,

@_A", # is defined as:

@A", # B_0,C DB0B_C; B0,C 1

8 8 59,;?9,;

;:(

;

9:(

B0 1

8 8 59,;

;:(

;

9:(

B_C 1

8 8 ?9,;

;:(

;

9:(

#4

Where: B0,C is the distance covariance of " and #, B0 and B_C respectively, are the distance variance of variables " and #.

In particular, @A⋅,⋅ ∈ 0,1. The closer its value is to 1, the stronger the correlation between variables, and vice versa. Furthermore, it is worth noting that the distance correlation doesn’t explicitly state the relation between a pair of variables (e.g., linear, quadratic, etc.), it just represents the similarity of 2 variables based on the Euclidean distance. With respect to the time complexity, this algorithm is in , where is the number of samples. Thus, the process can be rendered significantly long, if the dataset contains numerous samples [17].

2.2 Regression Tree (RT)

The regression Tree or RT is a part of the Classification and Regression Algorithm (CART) which is used to solve a problem related to the continuous input (i.e. real numbers).

The RT algorithm solves a least square problem by choosing in an optimal way, data separation into a smaller subset, where the variance for each subset is maximized [18]. In the prediction step, the estimation of the target variable is then given as the average of samples contained in the corresponding subset [8].

Let, … ; ∈ be a dataset and G G1G2 … G^⊺ be an output vector, where 1 2 … ^⊺; ∈ 1,2, … , is a vector, with denotes the measurement of at time . Suppose that, we are interested in deriving a mathematical model for H given at instance . In particular, we want to define a model

GI JKLM(, , … , N#5

Where: GI is the estimate of H at instance and JKL⋅: → is the regression tree function. In the beginning, the RT algorithm will create a tree-like structure, R which split into S hyper-rectangular sets @S, where ⋃ @_S:(^U S . Such that, @S is the subset of data corresponding to the leaf, V 1,2, … , S of the tree, R. Given VW {V: W ∈ @S} which is a function that maps the input, W ( … ^⊺ to the corresponding leaf @S, Then the prediction of the target variable H can be written as

GI 1

|@S| 8 G_[∈K

U

, #6

where |@S| denotes the cardinality or the number of samples contained in the set @S. In particular, Equation (6) stated that the prediction of G is given as the average of samples

(5)

444 Jatisi ISSN 2407-4322 Vol. 10, No. 2, Juni 2023, Hal. 440-455 E-ISSN 2503-2933

contained in the partition @S (we refer the reader to reference [18] for the detailed explanation in RT).

The collection of RTs is called Random Forest (RF). The RF algorithm takes the average of multiple RT outputs as its own output. In particular, the RF is introduced as a solution to the overfitting issue presented in the RT. The RF algorithm can avoid the overfitting problem by taking the average of RTs, which in turn will decrease the model variance [6].

Despite its capability, the RT algorithm is considered to be computationally expensive.

In particular, the RT algorithm is in the ⋅ while, the RF is in ] ⋅ ⋅ , where ] is the number of RTs, is the number of samples and is the number of parameters or features [11]. Thus, the bigger the dataset (i.e., it contains numerous samples), the longer it takes to complete the algorithms.

2.3 RT-based Damage Detection

In Reference [7], the authors proposed an RT-based fault detection algorithm. To apply this strategy, we need to expand the dataset as defined in the previous section into an extended dataset ^_ by adding the auto regressive term of order ` as features, such as

^_ {^⊺ 1 1^⊺… 1 `^⊺}_W:(,…,#7

with (, , … , ^⊺; ∈ . Next, following the same explanation in Section 2.2, we will obtain a binary tree R with V many partitions or leaves. Then, for each leaf in R we define an autoregressive (AR) model

. 8 5_.,Sb̅e _d. 1

9 :(

. #8

Where, V̅W {V: ̅W ∈ @S} is an indexing function that assigns the current input ̅W

^⊺ 1 1^⊺… 1 `^⊺ to the corresponding leaf in R and 5_.,Sb̅^e _d is the initial, -th AR model parameter for feature 2 in the corresponding leaf V̅_W.

We can now leverage the model defined in (8) to detect faults. Suppose that, when the condition is nominal (i.e., there is no damage present in the structure), then the dynamics of the AR model in (8) will remain constant and vice versa. In particular, we redefine the AR model defined in (8) into

. 8 5.,Sb̅_d. 1

9 :(

, #9 and assign the following dynamics

hi 7 1 hi 7 j, . k̅Whi 7 l, #10

where, hi 5(.,( 5.,( … 59.,( … 5(.,m … 59.,m is the state vector, k̅W ̅Wn(,Sb̅_d… ̅Wnm,Sb̅_d is the time-varying observation matrix with n.,. is the Kronecker’s delta, j and l is the uncorrelated white noise signal with the variance of B_o and Bp, respectively. The dynamical systems in (10) can be solved by leveraging the Kalman filter algorithm (we refer the reader to [19] for a detailed discussion on this method). In particular, the Kalman filter solution for (10) yields a series of computations as follows [7]:

hi| 1 1 $i 1 1| 1 1, qr| 1 1 qr 1 1| 1 1 7 s ⋅ Bo,

.| 1 1 k̅Whi| 1 1, t k̅Wqr| 1 1k̅W^⊺7 Bp,

@ _. 1 .| 1 1,

(6)

Jatisi ISSN 2407-4322

Vol. 10, No. 2, Juni 2023, Hal. 440-455 E- ISSN 2503-2933 445

Title of manuscript is short and clear, implies research results (First Author) uv qr| 1 1k̅W^⊺t^w(,

hi| $i| 1 1 1 uv@, qr| x1 1 uv.yqr| 1 1, #11

where, 5.| 1 1 and qr| 1 1 are the a priori estimate of the state and the state covariance matrix, while 5.| and qr| are the a posteriori estimate of them and t, @ and uv respectively are the pre-fit residual covariance, the prediction residual and the Kalman gain at instance . This estimate of 5_W which is generated on a run-time basis, then can be used in order to track if there exists a fault or damage presence in the system.

In general, the whole process of the Kalman filter for the state vector of size 1 ` is in

`^z [20]. In this context, the damage detection step presented in [7] will take the time complexity in 3^z `^z), which is rendered to be very slow if the number of samples is numerous.

METHODOLOGY AND EXPERIMENTAL SETUP 3.

In this section, we will discuss how to perform the distance correlation-based RT (d-ss RT), which is the RT algorithm performed on a subset of data (in this context, also sensors), selected according to the distance correlation. In particular, the discussion will be focused on how the d-ss RT is constructed, up to how to detect the fault with the d-ss RT algorithm.

Furthermore, we will also discuss the data structure used in this paper, which is provided by the engineering department of LANL.

3.1 Distance Correlation Based Regression Tree (d-ss RT)

In this section, we will show how to construct the d-ss RT model from scratch. For the sake of simplicity, let us redefine in Sections 2.1 and 2.2 to be a dataset that comes from the reading of sensors or accelerometers, such that ∈ , 1,2, … , , now corresponds to the measurement of the vibrational dynamics by the -th sensors. Suppose that we would like to track the dynamics of i∈ with a subset of data with cardinality {. In other words, we want to estimate the dynamics of i with only A number of sensors. The whole process of the algorithm can be summarized in 2 steps: (1) create the d-ss subset, (2) derive the RT model for the subset and (3) damage diagnosis.

Step 1. For every possible pair of 2 with 1,2, … , , we calculate the distance correlation @A, 2 by following the series of equations in (1) to (4). Next, let @.

@A1, 2 @A2, 2 … @A, 2; @. ∈ be the distance correlation vector of _.. Then the subset for . can be created in particular, by selecting A number of sensors that have the highest distance correlation to i. In particular, this is doable by sorting @. from the highest to the lowest value, and then pick the top A elements of it. by default, . correlates perfectly to itself.

Thus, only A1 1 distinct sensors will be included in the subset. The whole process for the 1^st step is summarized in algorithm 1.

Algorithm 1: Distance Correlation-Based Subset Selection

Inputs : ∈ @, variable to predict .; the desired number of sensors A| Outputs : tA set of indices tA⊂ , |tA| A

Process :

(7)

446 Jatisi ISSN 2407-4322 Vol. 10, No. 2, Juni 2023, Hal. 440-455 E-ISSN 2503-2933

t_A {2}

~ 1,2, … , A1 1

^∗ argmax^∗∈\@A^∗, 2 tA tA∪ ^∗

Step 2. The next step is to model the . with the RT algorithm proposed in [7]. Let ∈ be the subset of the sensor data corresponding to the set of indices tA. By substituting to A, then the RT model for . in particular, can be derived by following the series of calculations from (7) to (11). At this point, we will obtain the dynamics of the i, which can be used to track the fault presence in a structure.

Step 3. To detect the presence of damage, one can leverage the hi obtained in the previous step, by comparing it to the hi0 which is the model parameter at the nominal or initial condition. Without loss of generality, let be the state trace vector, where each instance of i is defined as

. hi#12 and let

_C 1 7 %# 0 3_C 1 1 %# 0 #13

be the upper bound and lower bound, respectively. We assume a system is in a nominal condition if . is contained in the lower and upper bound defined by the tolerance factor %#:

3_C _. C #14

and faulty, if otherwise. The choice of tolerance factor, %# is depending on the applications.

However, a large %# will increase the chance of a false positive, while a small %b will likely trigger the false negative. Figure 1 shows an example of . for both nominal and faulty cases.

Figure 1. The Dynamics of .: (a) Under Nominal Dataset (b) Under Faulty Dataset Source: Processed Image by MATLAB (2022)

(a) (b)

(8)

Jatisi ISSN 2407-4322

Vol. 10, No. 2, Juni 2023, Hal. 440-455 E- ISSN 2503-2933 447

Title of manuscript is short and clear, implies research results (First Author) 3.2 Bookshelf-Resembling Structure from LANL

We validate our methodology on the experimental dataset provided by the engineering department of Los Alamos National Laboratory [21]. The dataset was collected from the vibrational reading of a framed structure resembling a 3 tiers bookshelf built from a stack of aluminum plates and columns. Each level is made from a 1.3 cm thick aluminum plate that is fixed to the aluminum column with 2 bolts. There were 8 accelerometers (24 in total) installed on the structure on each level to measure the vibrational responses simulated by a vibrator placed on the bottom of the structure. Figure 2 illustrates the bookshelf-resembling structure provided by LANL.

To generate the datasets, the structure was vibrated with frequencies ranging from 0 to 3KHz for 8 seconds, producing 4096 samples for each dataset. The process was repeated multiple times, with various setups which then produced nominal and faulty datasets. There are 3 different kinds of faulty datasets provided: (1) the damage was induced in the joint (2a), (2) the damage was induced in the joint (4b) and (3) the damage was induced on both joints.

In this paper, we will consider all 3 faulty cases to verify the capability of the proposed algorithm in detecting faults. In particular, we will use 3000 samples from the nominal datasets to train the model. Then, we will use the rest of the nominal data and all 4096 samples from each of the 3 faulty datasets to measure the model predictive accuracy, the false negative rate of the proposed algorithm and to verify the damage diagnosis sensitivity of our algorithm.

Figure 2. The Bookshelf Resembling Structure by LANL.

Source: Adapted from [21]

(9)

448 Jatisi ISSN 2407-4322 Vol. 10, No. 2, Juni 2023, Hal. 440-455 E-ISSN 2503-2933

NUMERICAL VALIDATION 4.

In this section, we first validate the model predictive quality of the d-ss RT. As a comparison, we also run the simulation alongside 3 kinds of RT: the entropy-based subset selection RT (e-ss RT), the linear correlation-based subset selection RT (c-ss RT) and the classical RT, based on the methodology presented in [7] with the MATLAB software. We will show that the d-ss RT is more sensitive in detecting faults and can reach the same level of accuracy as the classical RT, but requires a significantly lower amount of computational time (measured in real-time with an 8-core, 16 GB RAM machine). In this section, without loss of generality, we chose the cardinality of the subset, A 5. This choice is made after considering the prediction accuracy and the time complexity of the algorithm.

%& 1 1 @t^H, H 100% #15

@t^G, GI 1

H 8G 1 GI

:(

#16

We measure the model predictive accuracy as where, is the Normalized Root Mean Square Error (NRMSE), , G, GI and G6 are the number of samples, the observed variable, the estimate of the observed variable and the mean of the observed variable, respectively. In the field of mathematical modeling, the NRMSE, also known as the Coefficient of Variation (CV) is a measure of how close our model is to reality. The smaller the CV, the closer the estimate resembles the observed valueInvalid source specified.. Thus, by taking 1 1 @t^, we are basically considering the proportion (i.e., the variance) of the observed variable that is explainable by the model.

With respect to the fault diagnosis step, we consider the tolerance factor %# as in (13) and (14) to be 2%. We purposely choose a very tight bound in order to verify which method (among c-ss, d-ss, e-ss and the classical RT) generates the least number of false positives and negatives. In this paper, the false positive and false negative rates are defined as

%9 q

24 100%

%

24 100% #17

where q is the number of sensors which misidentified a faulty case to be nominal, while

instead is the opposite of q. Furthermore, we also characterize the sensitivity of the fault detection with the exit time,

min ] ∉ 3C, C #18

which is the minimum amount of time needed for the trace vector to leave the nominal bound.

The smaller the , the more sensitive the algorithm is in detecting the presence of a fault.

4.1 Model Predictive Accuracy

Table 1 summarizes the model predictive accuracy of the RT-based algorithm. It can be seen from Table 1, that the d-ss RT has a comparable level of accuracy with the classical RT while needing only a quarter of the total time needed to run the classical RT algorithm when the simulation was run on an 8-core, 16 GB RAM machine.

Notice that the c-ss, d-ss and e-ss RT are using a smaller number of features than the classical RT, yet they are able to provide the model with accuracy almost as good as the classical RT (varies around 1%). Furthermore, the d-ss RT is faster than the classical RT and

(10)

Jatisi ISSN 2407-4322

Vol. 10, No. 2, Juni 2023, Hal. 440-455 E- ISSN 2503-2933 449

Title of manuscript is short and clear, implies research results (First Author) performs almost as fast as the c-ss and e-ss RT algorithms (with d-ss slightly faster than the other 2 methods). This is due to the fact that the RT algorithm has a time complexity in

, where is the number of samples and is the number of features. Since A| , then it is clear that the classical RT will take a longer time than the c-ss, d-ss and e-ss RT.

In summary, Table 1 has shown a piece of evidence that the c-ss, d-ss and e-ss RT are better than the classical RT, with d-ss RT slightly outperforming the e-ss and c-ss RT in both model predictive accuracy and computational time as happened in our experimental setup.

Considering the fact that d-ss RT is almost 4 times faster than the classical RT, yet able to reach the same level of accuracy, then it is not misleading to claim that the d-ss RT is better than the classical RT.

4.2 Damage Detection Quality

Table 2 shows the result of the fault detection diagnosis in the form of the exit time as in (18) for the classical, c-ss, d-ss and e-ss RT under the nominal dataset. Furthermore, Table 2 also shows the false negative rate of the classical, c-ss, d-ss and e-ss RT and the time needed to simulate all of the algorithms with an 8-core, 16 GB RAM machine. The (-) sign indicates that the algorithms don’t detect any fault in the nominal dataset.

(11)

450 Jatisi ISSN 2407-4322 Vol. 10, No. 2, Juni 2023, Hal. 440-455 E-ISSN 2503-2933

Table 1. The 1 1 @t^ 100% of The RT-Based Method

Source:Processed Data by MATLAB (2022)

From Table 2, we observe that in the nominal case, the d-ss RT produces the least number of false negative cases than the other 3 methods. Furthermore, in this experimental setup, it can be seen that the c-ss and the classical RT are more susceptible to the false negative case, with the % 25% for both algorithms. This is due to the fact that the c-ss RT only selects sensors which correlate well (linearly) to the target variable. In statistics, this strategy is in particular avoided, as it might cause multicollinearity [22] and overfitting [23] problems. The Multicollinearity problem increases the standard error of model parameters. In which, this process renders significant features to be insignificant and vice versa. While, the overfitting issue (which arises as the model models the training data too closely) generates a problem where a model fails to predict the test dataset, even though it belongs to the same observation.

N. Sensor Accuracy (in %)

Classical c-ss d-ss e-ss

1 94.10 93.88 93.47 92.50

2 93.61 93.24 93.24 92.83

3 89.44 89.65 89.36 87.94

4 91.05 90.87 90.87 90.24

5 94.10 93.67 93.67 93.43

6 93.76 93.53 93.47 93.07

7 89.60 89.73 89.20 87.99

8 91.29 91.11 91.11 90.64

9 92.64 91.82 91.82 91.79

10 93.92 92.08 92.08 92.29

11 88.26 85.18 85.18 85.67

12 88.80 86.97 87.81 87.62

13 93.48 92.51 92.51 92.60

14 93.18 91.89 91.90 92.15

15 88.34 84.97 84.80 85.40

16 89.23 87.36 87.81 87.70

17 91.46 90.99 91.39 91.48

18 92.77 91.11 93.12 92.45

19 93.74 93.90 93.90 94.26

20 94.08 94.12 94.12 93.77

21 91.61 91.23 91.72 88.70

22 93.12 92.64 92.81 93.02

23 94.02 94.05 94.05 92.65

24 94.02 94.05 94.05 93.91

Average 92.07 91.27 91.39 91.00

Time needed 255.53 s 64.16 s 62.13 s 62.53 s

(12)

Jatisi ISSN 2407-4322

Vol. 10, No. 2, Juni 2023, Hal. 440-455 E- ISSN 2503-2933 451

Title of manuscript is short and clear, implies research results (First Author) In this case, we concluded that even though the c-ss RT performs as good as d-ss RT with respect to the model predictive accuracy, it does suffer heavily from multicollinearity and overfitting issues. Hence, this algorithm is not suitable to be used as a fault detection algorithm.

Table 2. The Exit Time For All 4 RT Algorithms: Nominal Case

N. Sensor Nominal

Classical c-ss d-ss e-ss

1 - - - -

2 - - - -

3 4.31 - - -

4 - - - -

5 - - - -

6 - - - -

7 - - - -

8 - - - -

9 - 7.58 - -

10 - - - -

11 - - - -

12 6.65 - - -

13 - 3.43 3.43 -

14 - - - -

15 - - - -

16 - - - -

17 - 4.62 - 5.64

18 - 5.18 - 6.41

19 7.7 - - -

20 6.8 7.48 7.48 6.66

21 - 5.47 - -

22 7.14 - 7.14 6.59

23 - - - 4.48

24 7.46 - - -

% 25% 25% 12.5% 20.83%

Time 261.83 s 63.6 s 64.16 s 65.26 s Source: Processed Data by MATLAB (2022)

Table 3 shows the exit time of 4 RT algorithms under faulty datasets. The sign (-) signifies that the respected algorithm failed to detect fault presence in the corresponding subset of sensors. With respect to the fault sensitivity, it can be seen from Table 3, that none of the 4 algorithms failed to detect the presence of faults in damage case 3, as it happens that this damage case is the strongest among the 3 faulty datasets provided by LANL. Furthermore, for the damage cases 2 and 1, the d-ss and e-ss produce a smaller number of false positives, with

(13)

452 Jatisi ISSN 2407-4322 Vol. 10, No. 2, Juni 2023, Hal. 440-455 E-ISSN 2503-2933

respect to the classical and the c-ss RT, with %¢ less than 5%. This result shows that the d-ss and the e-ss RT are much more robust in detecting the presence of damage than the c-ss and the classical RT. In addition, the d-ss and e-ss require only about a quarter of the total time needed by the classical RT to perform fault detection.

Table 3. The Exit Time For All 4 RT Algorithms: Faulty Cases

Sensor Damage Case 1 Damage Case 2 Damage Case 3

Classical c-ss d-ss e-ss Classical c-ss d-ss e-ss Classical c-ss d-ss e-ss

1 0.08 0.08 0.12 0.16 0.04 0.05 0.05 0.07 0.72 0.23 0.24 0.20

2 0.09 0.11 0.11 0.12 0.05 0.05 0.05 0.07 0.88 0.18 0.18 0.18

3 - 1.86 1.14 0.94 - 0.40 0.44 0.38 0.19 0.81 0.52 1.11

4 - 0.96 0.96 1.70 - 1.47 1.47 1.87 1.77 2.91 2.91 3.06

5 0.08 0.09 0.09 0.09 0.06 0.07 0.07 0.04 0.83 0.24 0.24 0.20

6 0.08 0.11 0.11 0.12 0.05 0.07 0.06 0.05 0.85 0.20 0.20 0.16

7 - 1.50 0.67 0.70 - 0.65 0.35 1.03 0.58 1.11 0.29 1.10

8 - 0.82 0.82 1.34 - 1.69 1.69 2.03 1.46 3.51 3.51 2.30

9 0.42 3.28 3.28 1.67 0.45 4.47 4.47 0.66 0.02 4.93 4.93 0.94

10 0.99 2.54 2.54 - 1.47 5.20 5.20 1.38 0.12 4.52 4.52 5.32

11 0.27 3.27 3.27 2.24 0.27 - 1.63 3.28 0.63 1.63 1.63 7.95

12 3.51 4.04 4.06 4.04 2.28 - 4.71 6.37 1.14 1.13 2.91 2.08

13 0.90 - - 1.55 0.81 3.30 3.30 0.67 0.08 5.22 5.22 4.01

14 2.87 0.98 0.80 1.52 1.77 3.98 3.27 0.86 0.12 7.98 4.11 1.25

15 0.55 4.54 2.64 1.87 0.48 - 4.29 3.17 0.37 1.13 2.05 5.81

16 - 2.71 5.33 2.71 7.58 - 1.04 5.53 1.52 0.96 1.05 1.51

17 0.28 1.71 2.22 0.78 0.28 0.83 1.93 0.30 0.18 0.77 1.05 0.50

18 0.60 1.05 2.12 5.07 0.78 0.66 3.54 2.39 0.21 0.91 2.58 1.32

19 0.27 2.89 2.89 2.89 0.09 1.99 1.99 1.99 0.16 0.53 0.53 0.60

20 0.29 6.77 6.77 2.43 0.24 1.99 1.99 1.99 0.24 1.44 1.44 1.57

21 0.30 1.75 2.56 5.98 0.21 0.41 2.51 6.15 0.14 0.66 2.09 1.21

22 0.27 1.94 4.99 2.67 0.43 2.25 1.86 1.67 0.14 2.50 3.77 1.86

23 0.22 5.22 5.22 3.60 0.09 1.96 1.96 3.00 0.16 0.62 0.62 2.04

24 0.28 7.01 7.01 1.18 0.23 2.33 2.33 1.71 0.19 0.70 0.70 1.42

%_¢ 20.83% 4.17% 4.17% 4.17% 16.67% 16.67% 0% 0% 0% 0% 0% 0%

Time(s) 4055.23 858.35 810.47 813.25 3845.07 825.26 830.14 825.54 3838.57 826.25 829.86 843.49

Source: Processed Data by MATLAB (2022)

In particular, by considering Table 2 and Table 3, we concluded that d-ss RT is better than the e-ss RT (as it produces the least number of false negative cases), which is better than the c-ss RT. While the c-ss RT is better than the classical RT, as it is 4 times faster than the classical RT, yet it reaches the same quality of model predictive accuracy and fault diagnosis as the classical RT.

CONCLUSION 5.

In this paper we have proposed a novel fault detection algorithm based on the distance correlation from the probability theory and the regression tree from the CART algorithm in the decision tree learning to reduce the number of features (or sensors in this context) without significantly reducing the predictive model accuracy and the fault detection capability. We have

(14)

Jatisi ISSN 2407-4322

Vol. 10, No. 2, Juni 2023, Hal. 440-455 E- ISSN 2503-2933 453

Title of manuscript is short and clear, implies research results (First Author) shown on an experimental setup, that the proposed algorithm outperforms the other RT-based approaches and demonstrates that the proposed algorithm is significantly faster than the classical RT up to 4 times, while using only information from 5 sensors instead of 24, as in the classical RT. Furthermore, we obtain the false positive and false negative rates of the proposed algorithm are less than 15% which is evidence that the proposed algorithm is much more robust in predicting faults than the other RT-based approaches.

Limitation. In this study, we build our methodology around the RT algorithm proposed by [7]. In particular, we only highlighted the RT algorithm for fault detection, while there are still various kinds of methods for this topic, such as the Principal Component Analysis (PCA) or the Partial Least Square (PLS). However, we believe that the d-ss algorithm will also be able to improve the model prediction quality of the PCA and PLS when applied to these algorithms.

Since the d-ss algorithm eliminates variables which are not correlated to the target variables.

Furthermore, in this experiment, we only considered the exit time, which is the amount of time needed by the model parameter to pass the nominal bound, as the sole parameter to measure the fault diagnosis quality of the RT algorithms.

Suggestion. With respect to the limitation stated above, there are 2 possible steps or suggestions we can offer: first, consider using the RF algorithm, which is the collection of the RTs. This algorithm is able to help the RT algorithm to avoid the overfitting issue that usually arises in any regression method. Next, there are still various properties with respect to the exit time that can lead to further study. For example, one might consider the rate of change of the exit time, in order to understand the damage level of the faulty dataset. As happened in our experimental setup, the stronger the damage, the faster the time needed by the trace vector to leave the nominal bound. This parameter in particular is interesting, as it can tell the reader, how severe the level of damage in a structure is.

In a real-life setup, the proposed algorithm can be used to provide a plan on where to install a set of sensors. In particular, one might need to install all sensors in all feasible positions in a structure. Then, after the dynamics are obtained, the user may remove any sensors that are not correlated well to the target sensors. In this way, the number of sensors needed in a structure can be reduced, which will decrease the maintenance cost of a building, but without sacrificing the fault diagnosis quality.

REFERENCES

[1] H. V. Dang, H. Tran-Ngoc, T. V. Nguyen, T. Bui-Tien, G. De Roeck and H. X. Nguyen, 2020, Data-Driven Structural Health Monitoring Using Feature Fusion and Hybrid Deep Learning, IEEE Transactions on Automation Science and Engineering, Vol. 18, No. 4, pp.

2087-2103.

[2] D. A. Tibaduiza Burgos, R. C. Gomez Vargas, C. Pedraza, D. Agis and F. Pozo,2020, Damage Identification in Structural Health Monitoring: A Brief Review Fromts Implementation to The Use of Data-Driven Applications, Sensors, Vol. 20, No. 3, p. 733.

[3] M. Azimi, A. D. Eslamlou and G. Pekcan, 2020, Data-Driven Structural Health Monitoring and Damage Detection Through Deep Learning: State-of-The-Art Review, Sensors, Vol. 20, No. 10, p. 2778.

(15)

454 Jatisi ISSN 2407-4322 Vol. 10, No. 2, Juni 2023, Hal. 440-455 E-ISSN 2503-2933

[4] R. J. Lewis, 2000, An Introduction to Classification and Regression Tree (CART) Analysis, in Annual Meeting of The Society for Academic Emergency Medicine in San Francisco, California.

[5] W.-Y. Loh, 2011, Classification and Regression Trees, Wiley Interdisciplinary Reviews:

Data Mining and Knowledge Discovery, Vol. 1, No. 1, pp. 14-23.

[6] F. Smarra, G. D. D. Girolamo, V. Gattulli, F. Graziosi and A. D’Innocenzo, 2020, Learning Models for Seismic-Induced Vibrations Optimal Control in Structures Via Random Forests Journal of Optimization Theory and Applications, Vol. 187, No. 3, pp. 855-874.

[7] F. Smarra, J. Tjen and A. D'Innocenzo, 2022, Learning Methods for Structural Damage Detection Via Entropy-Based Sensors Selection, International Journal of Robust and Nonlinear Control, Vol. 32, No. 10, pp. 6035-6067.

[8] S. D. Panjaitan, J. Tjen, B. W. Sanjaya, F. T. P. Wigyarianto and S. Khouw, 2022, A Forecasting Approach for IoT-Based Energy and Power Quality Monitoring in Buildings IEEE Transactions on Automation Science and Engineering.

[9] Q. Ren, L. Ding, X. Dai, Z. Jiang and G. De Schutter, 2021, Prediction of Compressive Strength of Concrete With Manufactured Sand by Ensemble Classification and Regression Tree Method, Journal of Materials in Civil Engineering, vol. 33, no. 7.

[10] Z. Y. Jiang, 2021, Establishment and Optimization Of Sensor Fault Identification Model Based On Classification and Regression Tree and Particle Swarm Optimization, Materials Research Express, Vol. 8, No. 8, p. 085703.

[11] A. More and D. P. Rana, 2017, Review of Random Forest Classification Techniques To Resolve Data Imbalance, in 1st International Conference on Intelligent Systems and Information Management (ICISIM).

[12] J. Tjen, F. Smarra and A. D’Innocenzo, 2020, An Entropy-Based Sensor Selection Algorithm For Structural Damage Detection, in IEEE 16th International Conference on Automation Science and Engineering (CASE), Online Virtual Meeting.

[13] D. Edelmann, K. Fokianos and M. Pitsillou, 2019, An Updated Literature Review Distance Correlation and Its Applications to Time Series, International Statistical Review Vol. 87, no. 2, pp. 237-262.

[14] D. Edelmann, T. F. Mori and G. J. szekely, 2021, On Relationships Between The Pearson And The Distance Correlation Coefficients, Statistics & Probability Letters, vol. 169, p.

108960.

[15] M. E. Celebi, F. Celiker and H. A. Kingravi, 2011, On Euclidean Norm Approximation Pattern Recognition, Vol. 22, No. 2, pp. 278-283.

(16)

Jatisi ISSN 2407-4322

Vol. 10, No. 2, Juni 2023, Hal. 440-455 E- ISSN 2503-2933 455

Title of manuscript is short and clear, implies research results (First Author) [16] B. S. Everitt and A. Skrondal, 2010, The Cambridge Dictionary of Statistics 4th ed.

Cambridge: Cambridge University Press.

[17] X. Huo and G. J. Szekely, 2016, Fast Computing for Distance Covariance, Technometrics, Vol. 58, No. 4, pp. 435-447.

[18] L. Breiman, J. H. Friedman, R. A. Olshen and C. J. Stone, 2017, Classification and Regression Trees, Routledge.

[19] Y. a. B. H. Kim, “Introduction to Kalman Filter and Its Applications,2018, Introduction and Implementations of the Kalman Filter, vol. 1, pp. 1-16.

[20] F. Daum, 2005, Nonlinear filters: Beyond The Kalman Filte, IEEE Aerospace and Electronic Systems Magazine, Vol. 20, No. 8, pp. 57-69.

[21] Engineering Institute of LANL. (LANL), 2011, Available:

https://www.lanl.gov/projects/national-security-education-center/engineering/software/shm data-sets-and-software.php. [Accessed on 5/12/2021].

[22] J. I. Daoud, 2017, Multicollinearity And Regression Analysis, in Journal of Physics:

Conference Series.

[23] B. Ghojogh and M. Crowley, 2019, The Theory Behind Overfitting, Cross Validation, Regularization, Bagging, and Boosting: Tutorial, arXiv preprint arXiv:1905.12787.