Equation discovery using mode extraction via singular value decomposition

(1)

Translated Paper

Equation discovery using mode extraction via singular value decomposition

Takuya Suzuki

R&D Institute, Takenaka Corporation, Inzai City, Japan

Correspondence

Takuya Suzuki, R&D Institute, Takenaka Corporation, 1-5- 1 Otsuka, Inzai City, Chiba 270-1395, Japan.

Email:suzuki.takuya@takenaka.co.jp

The Japanese version of this paper was published in Volume 86 Number 783, pages838–847,https://doi.org/10.3130/

aijs.86.838ofJournal of Architectural and Planning (Transactions of AIJ). The authors have obtained permis- sion for secondary publication of the English version in another journal from the Editor ofJournal of Architec- tural and Planning (Transactions of AIJ). This paper is based on the translation of the Japanese version with some slight modifications.

Received November 15, 2022; Accepted December 8, 2022 doi: 10.1002/2475-8876.12327

Abstract

Many studies have been conducted to elucidate unknown input/output systems based on measured data obtained from experiments and observations. Specifi- cally, neural networks, which have been significantly developed in recent years, can be used for equation discovery. However, the identified networks are consid- erably large and difficult to understand. Many studies have been conducted on mathematical expressions that are easy to understand; however, there is a pau- city of studies on methods to determine a simple equation by eliminating unnecessary terms. In this study, we propose a novel method for identifying unknown equations using mode extraction via singular value decomposition. In this method, the “process of improving the precision of an equation” and the “process of deleting unnecessary terms” are alternately iterated based on singular value decomposition. We confirm the applicability of the proposed method via a sample problem on cantilever deflection. Therefore, the proposed method can accurately determine an equation of an input/output system composed of only essential terms (i.e., excluding unnecessary terms) from training data.

Keywords

equation discovery, inverse analysis, singular value decomposition

1. Introduction

Over the past several years, numerous studies have been conducted to reveal unknown input–output systems using data obtained from experiments or observations. Among the natural laws derived from observations, there are three laws derived by Kepler.¹

Neural networks can be considered an example of the identification of input–output systems because they are obtained by inductively identifying coefficients of complex functions from training data. In recent years, neural networks based on deep learning have developed remarkably in fields such as image recognition and speech recognition.²However, neural networks obtained by deep learning have a large number of unknown coefficients. Even if we can identify a system that can reproduce input–output relationships, it is challenging for humans to understand its inherent laws. The author also identified a recurrent neural network that reproduces the historical model in Ref. [3]; however, understanding the system remains challenging. Therefore, there is a need to understand an identified system. Recently, research on understanding neural networks has been advanced.⁴

In this study, we identify the unknown input/output systems using functions comprehensible to humans, as shown in Fig- ure 1. This identification is hereinafter referred to as equation discovery. Once a human-understandable equation is discoverable, the validity and verification of the identified system become easier. It is significantly relevant not only in the field of architecture but also in other scientific fields.

Numerous studies on equation discovery of such unknown input/output systems have been conducted previously. For example, a well-known system is Langley’s BACON.⁵ Iba et al. performed equation discovery using genetic program- ming.⁶ Matayoshi et al. performed function identification using a genetic algorithm.⁷In this method, the equation discovery is performed by representing a function described by the preposi- tion method on the chromosome. It has been confirmed that this method can identify functions that reproduce unknown input/output systems with high accuracy. However, it generates meaningless terms also, which increases the expression length.

To understand the meaning of the system, it is desirable that these meaningless terms are not generated and that the discov- ered equation comprises only essential terms. An improved method of combining ridge regression, which is a method of

This is an open access article under the terms of theCreative Commons Attribution-NonCommercial-NoDerivsLicense, which permits use and distribution in any medium, provided the original work is properly cited, the use is non-commercial and no modifications or adaptations are made.

Institute of Japan

(2)

sparse estimation, has been proposed in Ref. [8]. However, it cannot eliminate meaningless terms currently.

In this study, we propose a novel method for identifying unknown equations by a mathematical expression composed only of essential terms without unnecessary terms. Further, we verify the proposed method using examples. In the proposed method, equation discovery is considered a search problem for unknown coefficients of equations. The proposed method alternately repeats the “process of improving the precision of an equation” and the “process of deleting unnecessary terms”

based on the result of singular value decomposition of the Jacobian matrix. This makes it possible to identify equations consisting of only essential terms while ensuring the repro- ducibility of the input–output system. First, the outline of the proposed method and the calculation procedure is explained.

Further, the accuracy of the proposed method is verified using examples.

2. Proposed Method for Equation Discovery

This section shows the basic form of the equation to be identified, the features of the proposed function identification method, and its calculation procedure. Table 1 lists the major symbols used in the following sections. All vectors in this study are treated as column vectors.

2.1 The basic form of the equation and assumed inverse problem

The function f of the multiple-input–one-output system used for identification is shown in Equation (1).

The equation to be identified, as shown in Equation (1), is a series of terms. Each term is the product of the powers of each input parameter multiplied by the weight coefficients. This form allows the representation of various functions. However, there is a restriction that only positive values are available for input parameters. The weight coefficient part w_iand the exponent part p_ij are unknown coefficients. By changing these values, various functions can be represented. The vector summarizing these unknown coefficients will hereinafter be referred to as the coefficient vector {c}.

y¼f xð 1,x2,⋯xNIÞ ¼∑^N^T

i

wi∙Y^N^I

j

x^P_j^ij

!

(1)

Figure 2shows the diagram of the inverse problem assumed for identifying the coefficient vector {c} from the given training data. As shown in Figure2, once the coefficient vector {c}

is determined, the function f can be determined. Then, the error for each teacher data can be easily calculated. In this study, calculating the error vector {r}, which summarizes these errors, from the coefficient vector {c} is regarded as forward analysis. Further, the coefficient vector {c} that makes the error vector {r} zero vector is searched by the inverse analysis method described below.

2.2 Features of the proposed method

This section describes the features of the proposed method.

The proposed method alternately repeats the “correction to reduce errors” and the “correction to remove unnecessary components.”

The former is a process of “searching for a coefficient vector {c} that reduces the norm of the error vector {r}.” Many inverse analysis methods are primarily focused on this pur- pose.

However, if the processing is performed only to reduce the error, the unnecessary input vector components that do not affect the error are not corrected. Therefore, once such unnecessary input components are mixed with the coefficient vector {c} through the searching process, these are not removed by iterative correction. This leads to the generation of unnecessary terms in equation discovery. Thus, in the proposed method, the process of “removing unnecessary components that do not affect the error mixed with the coefficient vector {c}” is added. This improvement makes it possible to identify functions consisting of only essential terms while ensuring accuracy in reproducing training data.

TABLE 1. Symbol list

NC Number of unknown coefficients for identified equation N_D Number of training datasets

N⁰ Number of selected main modes NR Number of selected unnecessary modes

f gc Unknown coefficient vector for identified equation cinit

f g Initial unknown coefficient vector for identified equation f gr Residual error vector

½ K Jacobian matrix

½ U Left singular matrix (Output mode matrix)

½ V Right singular matrix (Input mode matrix) Σ

½ Singular value matrix K⁰

½ Jacobian matrix consisting main modes VR

½ Right singular matrix consisting unnecessary modes fð Þ Identified calc. system

Gð Þ Error vector calc. system Δc

f g Incremental coefficient vector

FIGURE 2. Inverse problem FIGURE 1. Equation discovery

(3)

The former process uses the modal iterative error correction method shown in Ref.[9]. Modal iterative error correction is a method for iteratively correcting errors by selecting a significant mode via the singular value decomposition of a Jacobian matrix, which consists of partial derivatives of error vectors for each component of the input vectors. Analytical or numerical derivatives may be used to calculate the Jacobian matrix.

The overall flow is similar to the Newton–Raphson method.¹⁰ The inverse matrix of the Jacobian matrix is composed of only major modes with large singular values, which enables effi- cient error reduction. It can also improve convergence performance. Because the number of dimensions of the input and output vectors can be set arbitrarily, it can be applied to the inverse analysis of various nonlinear input and output systems.

Thus far, its applicability has been confirmed in input motion inversion,⁹ spectrum fitting of simulated earthquake motion,¹¹ shape optimization of shear panel dampers,¹²and identification of building damping.¹³ For the latter processing, a denoising technique proposed for simplifying neural networks in Ref.

[14] is used. This method focuses on small singular value modes obtained by singular value decomposition of the Jaco- bian matrix and reduces these modes as unnecessary components included in the input vector. That is, from the input mode matrix obtained via singular value decomposition, only the minute singular value modes are extracted, and these minute mode component included in the input vector is deleted by the predetermined reduction rate. By repeating these two pro- cesses alternately, identifying mathematical equations composed of only essential terms while ensuring the accuracy of the reproduction of training data is possible.

2.3 Calculation procedure

Figure 3 shows a calculation flow diagram of the proposed method. The matrix sizes are listed below each matrix.

The overall flow is similar to the general iterative method.

At an odd number of iterations, including the first iteration,

“corrections that reduce errors ((vi), (vii), and (viii))” are performed. At an even number of iterations, “corrections to remove unnecessary components ((x), (xi), and (viii))” are performed.

Each process of the calculation procedure focusing on the features of the proposed method is described below. Details of each process are simplified. For further details, refer to Refs.

[9,14].

(i) Set the initial value of the coefficient vector, {c_init}.

f gc

NC

¼fcinitg

NC

(2) (ii) Calculate the error for each training data using the equation determined from the coefficient vector{c}. The error vector {r} defines these errors. G denotes the process of calculating {r} from {c}.

f gr

ND

¼G cðf gÞ

NC

(3) (iii) Calculate a Jacobian matrix (partial differential matrix) of error vectors {r} for each component of the coefficient vector {c}.

½ K

NDNC

¼DCf gr

ND

¼

∂r1=∂c1 ⋯ ∂r1=∂cNC

... ⋱ ...

∂rND=∂c1 ⋯ ∂rND=∂cNC

2 66 4

3 77

5 (4)

(iv) Perform singular value decomposition of the Jacobian matrix.

½ K

NDNC

¼ ½ U

NDND

Σ

½

NDNC

½ V^T

NCNC

(5)

(v) Check whether the current iteration count is odd or even.

If it is the former, it branches to procedure (vi) as “correction to reduce the error,” and if it is the latter, it branches to procedure (ix) as “correction to remove unnecessary components.”

(vi) Construct a generalized inverse matrix consisted of each degenerate submatrices by ignoring the modes with zero or relatively small singular values.

K⁰

½ ^þ

NCND

¼ ½ V⁰

NCN⁰½ Σ⁰¹

N⁰N⁰

U⁰

½ ^T

N⁰ND

(6)

(vii) Calculate the correction vector {Δc} from the error vector calculated in step (ii) and the inverse matrix calculated in step (vii).

Δc f g

NC

¼ ½ K⁰^þ

NCND

f gr

ND

(7)

(viii) Calculate the new coefficient vector from the correction vector and the old coefficient vector.

f gc

NC

¼f gc

NC

þf gΔc

NC

(8)

(ix) Terminate the calculation if the terminate condition is satisfied. Otherwise, the iterative calculation continues.

Further, proceed to “correction to remove unnecessary components” for an even number of iterations. The process is the same for an odd number of iterations until the calculation of the error vector in step (ii). Then, the creation of the Jacobian matrix in step (iii) and the singular value decomposition in step (iv) proceeds to the process in step (x) through the branch in step (v).

(x) From the input mode matrix V obtained in step (v), extract only the modes with small singular values to cre- ate a degenerate input mode matrix V_R.

½ V

NCNC

! ½VR

NCNR

(9)

(xi) Calculate the contribution of the unnecessary modes extracted in step (x) contained in the current coefficient vector {c}. Then, multiply this contribution by a predetermined reduction rate λ for calculating the correction factor vector to remove.

(4)

Δc f g

NC

¼ λ∙½VR

NCNR

VR

½ ^T

NRNC

f gc

NC

(10) Thereafter, correct the coefficient vector by step (viii) and repeat steps (ii) to (xi) until the iteration terminate condition is satisfied.

3. Accuracy Verification

In this section, the equation discovery method proposed in Section 2 is verified. A theoretical formula to calculate the deflection of a cantilever beam is used as an example, as shown in Figure 4. First, the applicability is verified when the input variables of the equation are known and there are no errors in the training data. Then, the applicability is verified when the input variables are unknown and an unnecessary variable is included. In addition, the applicability in the presence of measurement errors in training data is verified.

3.1 Sample problem

The equation for deflection δ of the cantilever beam considering shear deformation¹⁵ to be searched is shown in Equa- tion (11). The equation consists of two terms. The first term corresponds to the bending deformation, and the second term corresponds to the shear deformation.

δ¼QL³ 3EIþ QL

GAs (11)

where Q is the shear force, E is the Young’s modulus, G is the shear modulus, I is the moment of inertia of area, As is the shear cross-section, and L is the beam length.

From these six input values, the deflection δ is calculated using Equation (11). We confirmed that the proposed method can identify the equation of this “6 input–1 output system”

from the training data.

FIGURE 3. Flow diagram

(5)

3.2 Analytical condition

Equation (12) is used for the first identification.

Based on the equational form of Equation (1), the number of input variables N_Iis six and the number of terms N_Tis six.

Thus, the number of unknown coefficients is 42, consisting of 36 exponent coefficients and 6 weight coefficients.

δ¼f E, G, L, Q, As, Ið Þ

¼∑⁶

i¼1wiE^pÊiG^p^GiL^p^Li Q^p^QiAs^pÂsiI^pÎi: (12) The number of terms N_Tis six, which is slightly larger than the two terms of the correct equation.

This is such that it can be changed into various functions supposing that the function is unknown.

Table 2 shows an example of the coefficients of the equation in agreement with Equation (11).

We confirmed that identifying a simple equation with only two terms is possible, even if the number of terms N_Tis set to 6. Table3shows the initial values of the unknown coefficients.

The initial value is based on 1, and only one exponent coefficient for each term is 2. This is to avoid the search delay due to the existence of multiple solutions.

The number of training data N_D is 300. Using each randomly changed input parameter, the correct deflection is calculated as training data. Table 4 shows the range of each input parameter for generating training data. This range is set assum- ing a realistic value of Reinforced Concrete and steel beam.

The error of the identified equation is evaluated by an error vector {r} composed of the error r_k for each training data, as shown in Equation (13), where the subscript k corresponds to the k-th training data.

rk ¼δkf Eð k,Gk,Lk,Q_k,Ask,IkÞ (13) The functionfis varied by an unknown coefficient vector {c}.

The process of calculating the error vector {r} from the unknown coefficient vector {c} corresponds to the function G shown in Equation (3).

The main modes for error reduction are extracted by the

“norm ratio of the error vector”¹⁶shown in Figure5. The horizontal axis is the number of extracted modes N⁰. The vertical axis is the norm of the error vector {r’} that consists of only extracted modes up to order N⁰.

In addition, {r’_main} is the error vector that consists of only nmainmodes except for modes with sufficiently singular values.

The number of modes for error correction is set as the number of modes in which the ratio of the norm of {r ‘} to the norm of {r’_main} reaches a thresholdηlimit.

The norm ratio of the error vectors is expressed by Equa- tion (14). Here, the modes with sufficiently small singular values are those with singular values less than 10⁻⁹ times the principal singular value.

r⁰ f g j j

r⁰_main

¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

∑^N

0

i¼1f gui ^Tf gr 2

=ⁿ∑^main

i¼1f gui ^Tf gr 2

s

(14) {u_i} is a component of the i-column of the output mode matrix [U].

In this method, when the error vector consists of only lower- order modes, the correction is performed using only lower- order modes. Furthermore, when higher-order modes are included in the error vector, higher-order modes are used for correction. In this study, as in Ref.[13], the threshold ofηlimit

was set at 0.2.

TABLE 2. Coefficients for no error

i w pE pG pL pQ pAs pI

1 0.333 −1 0 3 1 0 −1

2 1 0 −1 1 1 −1 0

3 0 0 0 0 0 0 0

4 0 0 0 0 0 0 0

5 0 0 0 0 0 0 0

6 0 0 0 0 0 0 0

Shaded values means negligible terms.

TABLE 3. Initial coefficients

1 1 2 1 1 1 1 1

2 1 1 2 1 1 1 1

3 1 1 1 2 1 1 1

4 1 1 1 1 2 1 1

5 1 1 1 1 1 2 1

6 1 1 1 1 1 1 2

TABLE 4. Input parameter range for training data

Input parameter Range

E (N/mm²) 20 000–200 000

G (N/mm²) 5000–100 000

L (mm) 500–10 000

Q (N) 100–10 000 000

As (mm²) 4000–400 000

I (mm⁴) 100 000–10 000 000

FIGURE 5. Main mode selection method FIGURE 4. Cantilever deflection

(6)

The modes used for removing unnecessary input components are modes with a singular value less than 10⁻⁹times the principal singular value. The mode reduction ratioλis 0.10.

Numerical differentiation is used to calculate the Jacobian matrix. The perturbation is 0.0001 for each unknown coefficient. The number of iterations ends at 2000 times, i.e., the corrections for reducing errors are made 1000 times, and the corrections for removing unnecessary components are made 1000 times.

3.3 Identification result

This section explains the identification results using the analysis conditions presented in the previous section.

In Figure6, the convergence of the norm of the error vector {r} is shown as a solid black line. For comparison, the results without removing unnecessary input components are shown as a solid gray line. The circle in the figure indicates the minimum error.

From the figure, it can be confirmed that the error norm, which was approximately 10¹⁷mm in the beginning, decreased to approximately 500 mm in about 250 iterations. The minimum error norm was 0.007 mm after 747 iterations, and then, it almost remained constant. In the absence of processing to remove unnecessary input components, a relatively large error norm remained with a minimum error of approximately 0.146 mm.

In Figure 7, the convergence of the norm of the identified coefficient vector {c} is shown as a solid black line. As in Fig- ure 6, the results without removing unnecessary input

components are also shown with a solid gray line. Further, the circle in the figure indicates the minimum error norm iteration.

The dashed line in the figure indicates the norm of the correct coefficient vector shown in Table2. The figure shows that the coefficient norm, which was 60 in the beginning, gradually decreased with the convergence of the error norm. Then, it became smaller than the correct value once; however, it increased again as the error norm decreased. After the error norm was minimized, the value of the coefficient norm roughly matched the correct answer.

In the case of not removing unnecessary input components, the process of decreasing the coefficient norm is almost the same as the case with the removing process. However, the coefficient norm converged to 25.9, which was slightly larger than the correct value. This indicates that unnecessary terms are generated in the identified equation.

In Figure 8, the number of modes judged to be unnecessary is indicated by a solid black line in the case where unnecessary input components are removed. Further, the solid gray line indicates the norm of the reduced coefficient vector.

The figure shows that the unnecessary mode was not selected, and the removing process was not performed in the early iteration. The unnecessary modes were selected after about 250 iterations. Accordingly, the coefficient vector was redacted. This timing coincided with the number of converges that showed differences with and without removing unnecessary modes in Figure 7.

After 747 iterations, when the error norm was retained, the removing process was not performed although unnecessary modes existed. This may be because the current input components did not contain unnecessary components. Tables 5and6 show the identified coefficients at minimum error norm in each analysis case. Values in the table are given up to three decimal places. It is expressed as though integer values have been identified; however, these are real values, not integers.

Table 5 shows that most of the identified coefficients are zero. The equation expressed by excluding these zero

FIGURE 6. RSS error

FIGURE 7. Norm of the coefficient vector

FIGURE 8. Redacted mode vector

TABLE 5. Coefficients identified by the proposed method

1 0.000 0.000 0.000 0.000 0.000 0.000 0.000 2 0.000 0.000 0.000 0.000 0.000 0.000 0.000 3 0.000 0.000 0.000 0.000 0.000 0.000 0.000 4 0.333 −1.000 0.000 3.000 1.000 0.000 −1.000 5 0.000 0.000 0.000 0.000 0.000 0.000 0.000 6 1.000 0.000 −1.000 1.000 1.000 −1.000 0.000 Shaded values means negligible terms.

(7)

coefficients agrees with Equation (11). However, Table 6 shows that there are few identified coefficients that are about zero in the case of not removing unnecessary components.

The fourth and sixth terms have almost the same coefficients as Table 5. However, dominant values have been identified in other terms. Therefore, the equation expressed from these coefficients is significantly different from Equation (11). This indicates that obtaining an equation in simple forms only by processing to reduce errors is difficult.

As described above, we confirmed that the proposed method, which combines “correction to reduce errors” and “correction to remove unnecessary components,” can derive an unknown input/output system through a simple equation.

3.4 Effect of input parameters that do not affect the output

In the previous section, six input variables (E, G, L, As, Q, and I) were known. However, when trying to discover an unknown input/output system, the input variables are often also unknown. In such a case, the input variables need to be set more than enough number to include required variables. This can cause unnecessary input variables to be included in candidate input variables. Therefore, in this section, the applicability of the proposed method is verified in the case where unnecessary input variables are included as candidates for input variables. The equation for the deflection of the cantilever beam is the same as in the previous section, as shown in Figure4 and Equation (11). However, the temperatureΤ (degree Celsius) is added as a candidate input variable. Then, the identified equation is set as follows:

δ¼f E, G, L, Q, As, I, Tð Þ

¼∑⁶

i¼1

w_iE^PÊi G^P^GiL^P^LiQ^P^QiAs^PÂsiI^PÎiT^P^Ti (15)

The number of input variables N_I is changed from 6 to 7.

The number of terms NTis 6, the same as in the previous section. The number of identified coefficients is 48. The number of training data is set at 300, as in the previous section. The temperature Τ is determined randomly in the range of 1–300 degrees, and Equation (11) without the temperature T is used to calculate the deflection of the beam.

The initial values of the exponent part pTi for the temperature T are all set to 1. The values in Table3 are used for the initial values of all other coefficients.

In Figure9, the convergence of the norm of the error vector {r} is shown as a solid black line. In Figure 10, the convergence of the norm of the identified coefficient vector {c} is shown as a solid black line. The circle in the figure indicates the point at the minimum error norm. For comparison, the results of the case using 6 input parameters in the previous section are also shown with a solid gray line.

From the figure, we confirmed that the error norm decreased smoothly even when an unnecessary input parameter T was included. Further, the minimum error of 0.007 mm was the same value as the case without an unnecessary parameter T.

The norm of the coefficient vector at the end of the convergence calculation was 17.1. This was also the same value as the correct answer, as in the case without an unnecessary parameter. Table 7 shows the identified coefficients. The part enclosed in bold lines indicates the exponent of temperature pTi. The table shows that all values of pTi are 0.000. This indicates that the temperature T is an unnecessary parameter for the deflection of the beam. The values of the coefficients in the other parts were all consistent with those in Table 5.

Therefore, we confirmed that the accuracy of the identified equation did not change with the presence or absence of unnecessary parameters.

TABLE 6. Coefficients identified without the reduction process

1 0.898 0.848 −0.061 0.161 −0.456 −0.195 −0.498 2 0.899 −0.122 0.928 0.173 −0.438 −0.181 −0.489 3 0.902 −0.099 −0.021 1.175 −0.394 −0.150 −0.450 4 0.333 −1.000 0.000 3.000 1.000 0.000 −1.000 5 0.898 −0.134 −0.064 0.161 −0.462 0.779 −0.509 6 0.998 0.000 −1.000 1.000 1.000 −1.000 0.000 Shaded values means negligible terms.

FIGURE 9. Effect of unnecessary input on RSS error

FIGURE 10. Effect of unnecessary input to the norm of the coefficient vector

TABLE 7. Identified coefficients (seven inputs)

i W pE pG pL pQ pAs pI pT

1 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 2 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 3 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 4 0.333 −1.000 0.000 3.000 1.000 0.000 −1.000 0.000 5 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 6 1.000 0.000 −1.000 1.000 1.000 −1.000 0.000 0.000 Shaded values means negligible terms.

(8)

As described above, we confirmed that the proposed method is effective even when unnecessary input parameters are included. Thus, this method is applicable even when the true input parameters are unknown.

3.5 Effect of measurement error in training data

Until the previous section, the training data was the true values that did not include measurement errors.

However, various measurement errors can occur when acquiring training data. As shown in Figure11, data different from the true value may be used as training data to perform equation discovery.

Therefore, the applicability of the proposed method is exam- ined in the case where the training data includes measurement errors.

When measurement errors are included, even if the identified equation can perfectly reproduce the training data, it becomes the equation that reproduces outputs that include measurement errors. It is an incorrect equation. In addition, many unnecessary terms to represent the effect of measurement errors may be mixed into the identified equation. Therefore, we predicted that the search for simple equations and the understanding of the identified equations will become challenging.

Therefore, this study proposes a method to identify an equation ignoring errors within a particular range.

Specifically, as shown in Figure12, a tolerance error was set in advance and considering that no error existed within that tolerance error, the proposed identifying equation method was performed.

This made it possible to search for a function that passes through the “range” corresponding to the training data, instead of a function that passes through the training data “points.”

As a calculation process, the error r_k for each training data calculated by Equation (13) was changed to r’k calculated by Equation (16), where alpha is the tolerance ratio.

r0k¼ rk ðwhere f Ej ð k,Gk,Lk,Q_k,Ask,IkÞ=δkj>αÞ

0 ðelseÞ

(16) The 300 training data used up to the previous section were modified and used. Of these, a measurement error of1% was randomly given for each deflection δ. Measurement errors were not considered for the seven input parameters. The temperature was also considered an input parameter. Table 8 shows the analysis cases. Case N is a case in which no tolerance was set. The remaining two cases are those in which the tolerance was set to 5% of the output of the training data.

Case A5_R9 is a case in which the threshold of the singular value to be judged as unnecessary modes was 10⁻⁹ times the primary singular value. Case A5_R5 is a case in which the threshold was set to 10⁻⁵ times the primary singular value such that more modes are judged as unnecessary. More simplified functions could be identified by removing more modes.

The other analysis conditions were the same as those discussed in the previous section. However, the number of iterations was increased to 20 000 because larger vibrations were observed here compared to the previous sections.

Figure 13 shows the convergence of the norm of the error vector {r}. Case N corresponds to the black dashed line, Case A5_R9 to the solid gray line, and Case A5_R5 to the solid black line. For Case N, error norm diverged after more than 300 iterations, and the calculation was terminated. This may be because it was challenging to reproduce training data with measurement errors using the form of Equation (11). However, for the two cases in which the allowable error was set, the error norm decreased to about 5.0E6 mm after 500 iterations,

FIGURE 11. Measurement error

FIGURE 12. Setting allowable error

TABLE 8. Case list

Case name Allowable error Threshold SV ratio for null mode

N None 10⁻⁹

A5_R9 5% 10⁻⁹

A5_R5 5% 10⁻⁵

FIGURE 13. Effect of measurement error on RSS error

(9)

and the calculation was completed without divergence. The minimum errors were 1.8E4 mm for Case A5_R9 and 5.7E5 mm for Case A5_R5, respectively. The case A5R9 with fewer unnecessary modes selected had a smaller minimum error. In the figure, the error with the training data calculated using Equation (11), which is the correct answer, is indicated by a horizontal dotted line. The minimum error value for Case A5_R9 was smaller than this error. This indicates that the Case A5_R9 result may overfit the measurement error.

Figure 14 shows the convergence of the norm of the identified coefficient vector {c}. Line types are the same as in Fig- ure 13. From the figure, we confirmed that Case N largely turned to divergence just before the forced termination. For the remaining two cases, the calculation ended without any significant vibration.

Case N diverged, and the calculation was terminated. For the remaining two cases, the calculation ended without any significant vibration.

The values of the norms at the end of the calculation were 20.1 for Case A5_R9 and 12.5 for Case A5_R5. The former was larger than the norm of 17.1 calculated from the correct coefficients, and the latter was smaller. Therefore, in the former, unnecessary terms may have been mixed in, and in the latter, necessary terms may have been deleted. Tables 9 and 10list the identified coefficients at the end of the iterations for these two cases. Values less than 0.1 are hatched as negligible terms. Table9 shows that there are few negligible coefficients among those identified in the case of A5_R9. The equation expressed from these coefficients differs significantly from Equa- tion (11). From the results of Figures 13 and 14, it can be confirmed that overfitting occurs just by setting the tolerance

in Case A5_R9. Once overfitting occurs, understanding the essential equation becomes difficult.

However, in Case A5_R5, shown in Table 10, many coefficients are negligible. Ignoring these, the equation becomes simpler as follows:

δ¼0:319Q⁰^:⁹⁰⁴L^3:115

E⁰^:⁹⁶⁴I⁰^:⁹⁷⁵ (17) This equation is similar to the term of the bending deflection in Equation (11), which is the correct answer. As an alterna- tive, the term for the shear deflection may have been removed because its influence on the whole deflection was small. How- ever, it is considered significantly meaningful that more impor- tant terms of high impact can be obtained in a simplified form.

The difference in the reduction of the coefficient vector in each case was confirmed. Figure 15 shows the number of unnecessary modes by removing unnecessary components of the coefficient vector. Results were shown up to 500 iterations where the norms of the error and coefficient vectors were approximately constant. The timing at which the redaction of the coefficient vector started was almost the same for Case A5R9 and Case N. Since then, many redactions were done in A5R9. This redaction may have prevented the divergence. For Case A5R5 with more reduced modes, the start of redaction was even earlier. In addition, redaction was performed every

FIGURE 14. Effect of measurement error on the norm of the coefficient vector

TABLE 9. Identified coefficients (A5_R9)

i w pE pG pL pQ pAs pI pT

1 0.259 0.377 0.307 0.023 −0.044 −0.249 −0.025 0.287 2 0.701 −0.042 1.040 0.360 −0.267 −0.011 −0.399 0.500 3 0.710 0.261 0.016 0.976 −0.183 −0.094 −0.249 0.510 4 0.305 −0.968 −0.020 3.082 0.921 0.031 −0.994 −0.009 5 0.530 0.070 0.087 0.666 −0.216 0.704 −0.565 0.395 6 0.624 −0.076 −0.052 0.028 −0.482 −0.047 0.773 0.487 Shaded values means negligible terms.

TABLE 10. Identified coefficients (A5_R5)

i w pE pG pL pQ pAs pI pT

1 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 2 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 3 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 4 0.319 −0.964 −0.021 3.115 0.904 −0.006 −0.975 0.004 5 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 6 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 Shade value means negligible terms.

Case N

Case A5_R9

Case A5_R5 0

25 50

Number of redacted modes

0 25 50

0 250 500

Iteration times A

B

C

FIGURE 15. Number of redacted modes. (a) Case N, (b) Case A5_R9, (c) Case A5_R5

(10)

time after 250 iterations. This increase in reduction may have enabled the identification of a simpler equation.

As described above, we confirmed that when measurement errors are included in training data, the setting of the tolerance and the extension of the unnecessary modes are effective.

However, the appropriate tolerance and the appropriate range of unnecessary modes have not been clarified. These appropriate values need to be sought by a trial-and-error method.

4. Conclusion

In this study, we proposed a novel method for identifying unknown equations using mode extraction via singular value decomposition. This method is based on the modal iterative error correction method, which can effectively solve inverse problems with strong discontinuities. Additionally, the proposed method includes a process for removing null modes to obtain a simple equation without unnecessary terms.

The key findings of this study are as follows:

1 We consider equation discovery as an inverse problem of unknowns such as coefficients and exponents. Thus, a novel method is proposed wherein the “process of improving the precision of an equation” and the “process of deleting unnecessary terms” are alternately iterated based on singular value decomposition results of Jacobian matrices of unknown parameters. This method identifies functions consisting of only essential terms and ensures the repro- ducibility of unknown input/output systems.

2 The applicability of the proposed equation discovery method is confirmed via a sample problem on cantilever deflection. Thus, an equation of an input/output system composed of only essential terms (i.e., excluding unnecessary terms) can be identified with high precision using correct answer values provided as training data.

3 We confirmed that the proposed method is effective even in the presence of input variables that are unrelated to the input/output system. Furthermore, the necessity of “setting tolerance errors” and the “expansion of the range for null modes” when the measurement error is included in the training data is confirmed.

Acknowledgments

We would like to thank Editage (www.editage.com) for English lan- guage editing.

Data Availability Statement

The data that has been used is confidential.

Disclosure

Author Takuya Suzuki is employed by Takenaka Corporation. The author declares that this research was conducted without any commercial or financial relationships that could be construed as a potential conflict of interest.

References

1 Onitsuka S. Venus, Mars and Jupiter: Galileo, Kepler and Newton (series on history of science).Buturi Kyoiku. 1998;46(6):335-342. (in Japanese).

2 Okatani T.Deep Learning. Kodansha; 2015. (in Japanese).

3 Suzuki T, Matsuoka Y. Basical study on hysteretic modeling by using recurrent neural network. J Struct Constr Eng (Transactions of AIJ). 2017;82 (734):543-553. (in Japanese).

4 Fukushima T, Fujimaki R, Okanohara D, Sugiyama M. Outlook for big data and machine learning cutting-edge technological challenges and expanding applications.J Inf Process Manag. 2017;60(8):543-554. (in Japanese).

5 Langley P, Simon HA, Bradshaw GL, Zytkow JM.Scientific discovery: com- putational explorations of the creative process. MIT Press; 1987.

6 Iba H , Sato T , de Garis H. System identification approach to genetic pro- graming.JSAI. 1994;10(4):590-600. (in Japanese).

7 Matayoshi M, Nakamura M, Miyagi H. Local search methods for tree chromosome structure in a GA to identify functions.Trans Inst Electr Eng Jpn C. 2006;126(1):123-131. (in Japanese).

8 Vaddireddy H, Rasheed A, Staples AE, San O. Feature engineering and symbolic regression methods for detecting hidden physics from sparse sensor observation data.Phys Fluids. 2020;32(1):015113.

9 Suzuki T. Input motion inversion in elasto-plastic soil model by using modal iterative error correction method.J Struct Constr Eng (Transactions of AIJ).

2018;83(749):1021-1029. (in Japanese).

10 Owen DRJ, Hinton E.Finite Elements in Plasticity—Theory and Practice.

Pineridge Press Limited; 1980.

11 Suzuki T. Generation of simulated earthquake motions considering actual earthquake phase and multi target response spectrums.J Struct Constr Eng (Transactions of AIJ). 2019;84(760):811-818. (in Japanese).

12 Suzuki T. Analytical study on web imperfection design of shear panel dam- per for mechanical performance control.J Struct Constr Eng (Transactions of AIJ). 2019;84(765):1401-1409. (in Japanese).

13 Suzuki T. Identification of rotational input motion and damping ratio using horizontal acceleration records.J Struct Constr Eng (Transactions of AIJ).

2020;85(767):51-60. (in Japanese).

14 Suzuki, T. Creating smaller neural networks by applying seismic waveform processing.The 34th Annual Conference of the Japanese Society for Artifi- cial Intelligence, 3Rin4-21, 2020.

15 Timoshenko T.Strength of Materials Part 1: Elementary Theory and Prob- lems. 3rd ed. CBS; 2002.

16 Suzuki T. Mode selection method in modal iterative error correction for sta- bilization of convergence. J Struct Constr Eng (Transactions of AIJ).

2019;84(756):195-203. (in Japanese).

How to cite this article:Suzuki T. Equation discovery using mode extraction via singular value decomposition.

Jpn Archit Rev. 2023;6:e12327.https://doi.org/10.1002/

2475-8876.12327