• Tidak ada hasil yang ditemukan

Chapter IV: Iterated micro-local kernel mode decomposition for known base

4.4 Numerical experiments

filter to locally smooth (denoise) the curves corresponding to each estimate πœƒπ‘–,𝑒 (which corresponds to refining our phase estimates through GPR filtering) (2) start- ing with a larger𝛼(to decrease interference from other modes/overtones) and slowly reducing its value in the optional final refinement loop (to further localize our esti- mates after other components, and hence interference, have been mostly eliminated).

Mode k𝑣𝑖 , 𝑒kπ‘£βˆ’π‘£π‘–k𝐿2

𝑖k𝐿2

k𝑣𝑖 , π‘’βˆ’π‘£π‘–k𝐿∞

k𝑣𝑖k𝐿∞

kπ‘Žπ‘– , π‘’βˆ’π‘Žπ‘–k𝐿2

kπ‘Žπ‘–k𝐿2 kπœƒπ‘–,π‘’βˆ’πœƒπ‘–k𝐿2

𝑖 =1 1.00Γ—10βˆ’8 2.40Γ—10βˆ’8 7.08Γ—10βˆ’9 6.52Γ—10βˆ’9 𝑖 =2 2.74Γ—10βˆ’7 2.55Γ—10βˆ’7 1.87Γ—10βˆ’8 2.43Γ—10βˆ’7 𝑖 =3 2.37Γ—10βˆ’7 3.67Γ—10βˆ’7 1.48Γ—10βˆ’7 1.48Γ—10βˆ’7 Table 4.2: Signal component recovery errors in the triangle base waveform example over[βˆ’1

3, 1

3].

EKG wave example

The base waveform is the EKG wave displayed in Figure 4.1. We use the same discrete mesh as in the triangle case. Here, we took𝛼=25 in the loop corresponding to steps1to19and slowly decreased it to 15 in the final loop corresponding to steps 22 to 28. The amplitudes and frequencies of each of the modes are shown in Figure4.3, while the recovery error of each mode as well as their amplitude and phase functions are shown both over the whole interval[βˆ’1,1]and the interior third [βˆ’1

3, 1

3] in Tables4.3and4.4, respectively. Within the interior third of the interval, amplitude and phase relative errors are found to be on the order of 10βˆ’4 to 10βˆ’5 in this setting. However, over [βˆ’1,1], the mean errors are more substantial, with amplitude and phase estimates in the 10βˆ’1to 10βˆ’3range. Note the high error rates in 𝐿∞stemming from errors in placement of the tallest peak (the region around which is known as the R wave in the EKG community). In the center third of the interval, 𝑣𝑖,𝑒and𝑣𝑖are visually indistinguishable due to the small recovery errors.

Mode k𝑣𝑖 , 𝑒kπ‘£βˆ’π‘£π‘–k𝐿2

𝑖k𝐿2

k𝑣𝑖 , π‘’βˆ’π‘£π‘–k𝐿∞

k𝑣𝑖k𝐿∞

kπ‘Žπ‘– , π‘’βˆ’π‘Žπ‘–k𝐿2

kπ‘Žπ‘–k𝐿2 kπœƒπ‘–,π‘’βˆ’πœƒπ‘–k𝐿2

𝑖 =1 5.66Γ—10βˆ’2 1.45Γ—10βˆ’1 4.96Γ—10βˆ’3 8.43Γ—10βˆ’3 𝑖 =2 4.61Γ—10βˆ’2 2.39Γ—10βˆ’1 2.35Γ—10βˆ’2 1.15Γ—10βˆ’2 𝑖 =3 1.34Γ—10βˆ’1 9.39Γ—10βˆ’1 9.31Γ—10βˆ’3 2.69Γ—10βˆ’2 Table 4.3: Signal component recovery errors on [βˆ’1,1]in the EKG base waveform example.

Mode k𝑣𝑖 , 𝑒kπ‘£βˆ’π‘£π‘–k𝐿2

𝑖k𝐿2

k𝑣𝑖 , π‘’βˆ’π‘£π‘–k𝐿∞

k𝑣𝑖k𝐿∞

kπ‘Žπ‘– , π‘’βˆ’π‘Žπ‘–k𝐿2

kπ‘Žπ‘–k𝐿2 kπœƒπ‘–,π‘’βˆ’πœƒπ‘–k𝐿2

𝑖 =1 1.80Γ—10βˆ’4 3.32Γ—10βˆ’4 3.52Γ—10βˆ’5 2.85Γ—10βˆ’5 𝑖 =2 4.35Γ—10βˆ’4 5.09Γ—10βˆ’4 3.35Γ—10βˆ’5 7.18Γ—10βˆ’5 𝑖 =3 3.63Γ—10βˆ’4 1.08Γ—10βˆ’3 7.23Γ—10βˆ’5 6.26Γ—10βˆ’5 Table 4.4: Signal component recovery errors on[βˆ’1

3, 1

3]in the EKG base waveform example.

C h a p t e r 5

ITERATED MICRO-LOCAL KERNEL MODE

DECOMPOSITION FOR UNKNOWN BASE WAVEFORMS

We continue our discussion of KMD techniques by examining its application to an extension of original mode recovery problem, Problem7. We generalize the problem to the case where base waveforms of each mode are unknown [102, Sec. 9] and is formally stated below in Problem 9. Previously, in Section 4, we discussed how GPR can be applied to learn the instantaneous amplitudes and phases of each mode.

In the context of the unknown waveform problem, we will introduce micro-local waveform KMD in Section 5.1, which again utilizes GPR and is able to estimate waveforms of modes.

Problem 9. For π‘š ∈Nβˆ—, letπ‘Ž1, . . . , π‘Žπ‘š be piecewise smooth functions on [βˆ’1,1], letπœƒ1, . . . , πœƒπ‘š be piecewise smooth functions on[βˆ’1,1]such that the instantaneous frequenciesπœƒΒ€π‘–are strictly positive and well separated, and let𝑦1, . . . , π‘¦π‘š be square- integrable2πœ‹-periodic functions. Assume thatπ‘šand theπ‘Žπ‘–, πœƒπ‘–, 𝑦𝑖 are all unknown.

Given the observation 𝑣(𝑑)=

π‘š

Γ•

𝑖=1

π‘Žπ‘–(𝑑)𝑦𝑖 πœƒπ‘–(𝑑)

, 𝑑 ∈ [βˆ’1,1], (5.0.1) recover the modes𝑣𝑖(𝑑) :=π‘Žπ‘–(𝑑)𝑦𝑖 πœƒπ‘–(𝑑)

.

To avoid ambiguities caused by overtones with the unknown waveforms𝑦𝑖, we will assume that the corresponding functions(π‘˜πœƒΒ€π‘–)π‘‘βˆˆ[βˆ’1,1] and (π‘˜0πœƒΒ€π‘–0)π‘‘βˆˆ[βˆ’1,1] are distinct for𝑖 ≠𝑖0andπ‘˜ , π‘˜0 ∈Nβˆ—, that is, they may be equal for some𝑑 but not for all𝑑. We represent the𝑖-th base waveform 𝑦𝑖through its Fourier series

𝑦𝑖(𝑑)=cos(𝑑) +

π‘˜max

Γ•

π‘˜=2

𝑐𝑖,(π‘˜ ,𝑐)cos(π‘˜ 𝑑) +𝑐𝑖,(π‘˜ ,𝑠)sin(π‘˜ 𝑑)

, (5.0.2)

that, without loss of generality, has been scaled and translated. Moreover, since we operate in a discrete setting, we also truncate the series at a finite levelπ‘˜max, which is naturally bounded by the inverse of the resolution of the discretization in time. To

Figure 5.1: [102, Fig. 30], (1) signal 𝑣 (the signal is defined over [βˆ’1,1] but displayed over [0,0.4] for visibility) (2) instantaneous frequencies πœ”π‘– := πœƒΒ€π‘– (3) amplitudes π‘Žπ‘– (4, 5, 6) Modes 𝑣1, 𝑣2, 𝑣3 over [0,0.4] (mode plots have also been zoomed in for visibility).

Figure 5.2: [102, Fig. 31], illustrations showing (1)𝑦1(2)𝑦2(3)𝑦3.

illustrate our approach, we consider the signal𝑣 =𝑣1+𝑣1+𝑣3and its corresponding modes 𝑣𝑖 := π‘Žπ‘–(𝑑)𝑦𝑖 πœƒπ‘–(𝑑)

displayed in Figure5.1, where the corresponding base waveforms𝑦1, 𝑦2and𝑦3are shown in Figure5.2and described in Section5.3.

5.1 Micro-local waveform KMD

We are now describing the micro-localwaveformKMD [102, Sec. 9.1], Algorithm 4, which takes as inputs a time 𝜏, estimated instantaneous amplitude and phase functions𝑑 β†’ π‘Ž(𝑑), πœƒ(𝑑), and a signal𝑣, and outputs an estimate of the waveform 𝑦(𝑑) associated with the phase function πœƒ. The proposed approach is a direct extension of the one presented in Section 4.2 and the shaded part of Figure 5.3 shows the new block which will be added to Algorithm3, the algorithm designed

for the case when waveforms are non-trigonometric and known. As described below this new block produces an estimator 𝑦𝑖,𝑒 of the waveform 𝑦𝑖 from an estimate πœƒπ‘–,𝑒 of the phaseπœƒπ‘–.

Figure 5.3: [102, Fig. 32], high level structure of Algorithm4for the case when the waveforms are unknown.

Given𝛼 > 0,𝜏∈ [βˆ’1,1], and differentiable function𝑑 β†’πœƒ(𝑑), define the Gaussian process

πœ‰

𝑦

𝜏,πœƒ(𝑑) =π‘’βˆ’

πœƒπ‘’Β€ (𝜏) (π‘‘βˆ’πœ) 𝛼

2 𝑋

𝑦

1,𝑐cos πœƒ(𝑑) +

π‘˜max

Γ•

π‘˜=2

𝑋

𝑦

π‘˜ ,𝑐cos π‘˜ πœƒ(𝑑) +𝑋

𝑦

π‘˜ ,𝑠sin π‘˜ πœƒ(𝑑) , (5.1.1) where𝑋

𝑦 1,𝑐, 𝑋

𝑦

π‘˜ ,𝑐, and𝑋

𝑦

π‘˜ ,𝑠are independentN (0,1) random variables. Let π‘£πœ(𝑑) :=π‘’βˆ’

πœƒπ‘’(Β€ 𝜏) (π‘‘βˆ’πœ) 𝛼

2

𝑣(𝑑), 𝜏 ∈ [βˆ’1,1], (5.1.2) be the windowed signal, and define

𝑍

𝑦

π‘˜ , 𝑗(𝜏, πœƒ , 𝑣) :=lim

πœŽβ†“0E 𝑋

𝑦 π‘˜ , 𝑗

πœ‰

𝑦

𝜏,πœƒ +πœ‰πœŽ =π‘£πœ

, (5.1.3)

and, forπ‘˜ ∈ {2, . . . , π‘˜max}, 𝑗 ∈ {𝑐, 𝑠}, let π‘π‘˜ , 𝑗(𝜏, πœƒ , 𝑣) :=

𝑍

𝑦

π‘˜ , 𝑗(𝜏, πœƒ , 𝑣) 𝑍

𝑦

1,𝑐(𝜏, πœƒ , 𝑣)

. (5.1.4)

When the assumed phase function πœƒ :=πœƒπ‘–,𝑒 is close to the phase functionπœƒπ‘– of the 𝑖-th mode of the signal𝑣in the expansion (5.0.1),π‘π‘˜ , 𝑗(𝜏, πœƒπ‘–,𝑒, 𝑣)yields an estimate of the Fourier coefficient𝑐𝑖,(π‘˜ , 𝑗) (5.0.2) of the𝑖-th base waveform𝑦𝑖 at time𝑑 =𝜏. This waveform recovery is susceptible to error when there is interference in the overtone frequencies (that is for the values of𝜏at which 𝑗1πœƒΒ€π‘–

1 β‰ˆ 𝑗2πœƒΒ€π‘–

2 for𝑖1 < 𝑖2).

However, since the coefficient𝑐𝑖,(π‘˜ , 𝑗) is independent of time, we can overcome this by computingπ‘π‘˜ , 𝑗(𝜏, πœƒπ‘–,𝑒, 𝑣)at each time𝜏and take the most common approximate

value as follows. Let𝑇 βŠ‚ [βˆ’1,1] be the finite set of values of 𝜏in the numerical discretization of the time axis with𝑁 :=|𝑇|elements. For interval 𝐼 βŠ‚ R,

𝑇𝐼 :={𝜏 βˆˆπ‘‡|π‘π‘˜ , 𝑗(𝜏, πœƒπ‘–,𝑒, 𝑣) ∈ 𝐼}, (5.1.5) and let 𝑁𝐼 :=|𝑇𝐼|denote the number of elements of𝑇𝐼. Let 𝐼maxbe a maximizer of the function𝐼 β†’ 𝑁𝐼 over intervals of fixed width 𝐿, and define the estimate

π‘π‘˜ , 𝑗(πœƒπ‘–,𝑒, 𝑣) :=





ο£²



ο£³

1 𝑁𝐼max

Í

πœβˆˆπ‘‡πΌmax

π‘π‘˜ , 𝑗(𝜏, πœƒπ‘–,𝑒, 𝑣) ,

𝑁𝐼max

𝑁 β‰₯ 0.05

0 ,

𝑁𝐼

max

𝑁 < 0.05

, (5.1.6) of the Fourier coefficient 𝑐𝑖,(π‘˜ , 𝑗) to be the average of the values of π‘π‘˜ , 𝑗(𝜏, πœƒπ‘–,𝑒, 𝑣) over𝜏 ∈ 𝑇𝐼

max. The interpretation of the selection of the cutoff 0.05 is as follows:

if 𝑁𝐼𝑁max is small then there is interference in the overtones at all time [βˆ’1,1] and no information may be obtained about the corresponding Fourier coefficient. When the assumed phase function is near that of the lowest frequency mode𝑣1, which we write πœƒ := πœƒ1,𝑒, Figures5.4.2 and 4 shows zoomed-in histograms of the functions πœβ†’ 𝑐(3,𝑐)(𝜏, πœƒ1,𝑒, 𝑣)andπœβ†’π‘(3,𝑠)(𝜏, πœƒ1,𝑒, 𝑣) displayed in Figures5.4.1 and 3.

Figure 5.4: [102, Fig. 33], (1) a plot of the function 𝜏 β†’ 𝑐(3,𝑐)(𝜏, πœƒ1,𝑒, 𝑣) (2) a histogram (cropping outliers) with bin width 0.002 of𝑐(3,𝑐)(𝜏, πœƒ1,𝑒, 𝑣) values. The true value 𝑐1,(3,𝑐) is 1/9 since 𝑦1 is a triangle wave. (3) a plot of the function 𝜏 β†’ 𝑐(3,𝑠)(𝜏, πœƒ1,𝑒, 𝑣) (2) a histogram (cropping outliers) with bin width 0.002 of 𝑐(3,𝑠)(𝜏, πœƒ1,𝑒, 𝑣) values. The true value𝑐1,(3,𝑠) of this overtone is 0.

On the interval width𝐿. In our numerical experiments, the recovered modes and waveforms show little sensitivity to the choice of 𝐿. In particular, we set 𝐿 to be 0.002, whereas widths between 0.001 and 0.01 yield similar results. The rationale for the rough selection of the value of 𝐿 is as follows. Suppose 𝑣 = cos(πœ”π‘‘) and 𝑣0=𝑣+cos(1.5πœ”π‘‘). Define the quantity

max

𝜏

𝑐2,𝑐(𝜏, πœƒ , 𝑣0) βˆ’π‘2,𝑐(𝜏, πœƒ , 𝑣)

, (5.1.7)

with the intuition of approximating the maximum corruption by the cos(1.5πœ”π‘‘)term in the estimated first overtone. This quantity provides a good choice for 𝐿 and is mainly dependent on the selection of𝛼and marginally onπœ”. For our selection of 𝛼=10, we numerically found its value to be approximately 0.002.

5.2 Iterated micro-local KMD with unknown waveforms algorithm Algorithm 4Iterated micro-local KMD with unknown waveforms.

1: 𝑖 ←1 and𝑣(1) ←𝑣

2: whiletruedo

3: if πœƒlow(𝑣(𝑖)) =βˆ…then

4: break loop

5: else

6: πœƒπ‘–,𝑒 ← πœƒlow(𝑣(𝑖))

7: 𝑦𝑖,𝑒 ←cos(𝑑)

8: end if

9: π‘Žπ‘–,𝑒(𝑑) ←0

10: repeat

11: for𝑙in{1, ..., 𝑖}do

12: 𝑣𝑙 ,res ← π‘£βˆ’π‘Žπ‘™ ,𝑒𝑦¯𝑙 ,𝑒(πœƒπ‘™ ,𝑒) βˆ’Γ

π‘˜β‰ π‘™ , π‘˜β‰€π‘–π‘Žπ‘˜ ,𝑒𝑦𝑙 ,𝑒(πœƒπ‘˜ ,𝑒)

13: π‘Žπ‘™ ,𝑒(𝜏) ← π‘Ž 𝜏, πœƒπ‘™ ,𝑒, 𝑣𝑙 ,res /𝑐1

14: πœƒπ‘™ ,𝑒(𝜏) ← πœƒπ‘™ ,𝑒(𝜏) + 1

2π›Ώπœƒ 𝜏, πœƒπ‘™ ,𝑒, 𝑣𝑙 ,res

15: 𝑐𝑙 ,(π‘˜ , 𝑗),𝑒 β†π‘π‘˜ , 𝑗 πœƒπ‘™ ,𝑒, 𝑣𝑙 ,res

16: 𝑦𝑙 ,𝑒(𝑑) ←cos(𝑑) +Γπ‘˜max

π‘˜=2 𝑐𝑙 ,(π‘˜ ,𝑐),𝑒cos(π‘˜ 𝑑) +𝑐𝑙 ,(π‘˜ ,𝑠),𝑒sin(π‘˜ 𝑑)

17: end for

18: untilsup𝑙 ,𝜏

π›Ώπœƒ 𝜏, πœƒπ‘™ ,𝑒, 𝑣𝑙 ,res < πœ–1

19: 𝑣(𝑖+1) ← π‘£βˆ’Γ

π‘—β‰€π‘–π‘Žπ‘— ,𝑒𝑦𝑖,𝑒(πœƒπ‘— ,𝑒)

20: 𝑖←𝑖+1

21: end while

22: π‘šβ†π‘–βˆ’1

23: if refine_final=True then

24: repeat

25: for𝑖in{1, . . . , π‘š}do

26: 𝑣𝑖,res ← π‘£βˆ’π‘Žπ‘–,𝑒𝑦¯𝑖,𝑒(πœƒπ‘–,𝑒) βˆ’Γ

π‘—β‰ π‘–π‘Žπ‘— ,𝑒𝑦𝑗 ,𝑒(πœƒπ‘— ,𝑒)

27: π‘Žπ‘–,𝑒(𝜏) β†π‘Ž 𝜏, πœƒπ‘–,𝑒, 𝑣𝑖,res

28: πœƒπ‘–,𝑒(𝜏) β†πœƒπ‘–,𝑒(𝜏) + 1

2π›Ώπœƒ 𝜏, πœƒπ‘–,𝑒, 𝑣𝑖,res

29: 𝑐𝑖,(π‘˜ , 𝑗),𝑒 ← π‘π‘˜ , 𝑗 πœƒπ‘–,𝑒, π‘£βˆ’Γ

π‘—β‰ π‘–π‘Žπ‘— ,𝑒𝑦𝑗 ,𝑒(πœƒπ‘— ,𝑒)

30: 𝑦𝑖,𝑒(𝑑) ←cos(𝑑) +Γπ‘˜max

π‘˜=2 𝑐𝑖,(π‘˜ ,𝑐),𝑒cos(π‘˜ 𝑑) +𝑐𝑖,(π‘˜ ,𝑠),𝑒sin(π‘˜ 𝑑)

31: end for

32: untilsup𝑖,𝜏

π›Ώπœƒ 𝜏, πœƒπ‘–,𝑒, 𝑣𝑖,res < πœ–2

33: end if

34: Return the modes 𝑣𝑖,𝑒 β†π‘Žπ‘–,𝑒(𝑑)𝑦(πœƒπ‘–,𝑒(𝑑))for𝑖=1, ..., π‘š

Except for the steps discussed in Section5.1, Algorithm4[102, Sec. 9.2] is identical to Algorithm3. As illustrated in Figure 5.3, we first identify the lowest frequency of the cosine component of each mode (steps6and7in Algorithm4). Next, from steps 10 to 18, we execute a similar refinement loop as in Algorithm 3 with the addition of an application of micro-local waveform KMD on steps 15 and 16 to estimate base waveforms. Finally, once each mode has been identified, we again apply waveform estimation in steps29-30(after nearly eliminating other modes and reducing interference in overtones for higher accuracies).

5.3 Numerical experiments

To illustrate this learning of the base waveform of each mode, we take 𝑣(𝑑) = Í3

𝑖=1π‘Žπ‘–(𝑑)𝑦𝑖(πœƒπ‘–(𝑑)), where the lowest frequency mode π‘Ž1(𝑑)𝑦1(πœƒ1(𝑑)) has the (un- known) triangle waveform 𝑦1 of Figure 4.1 [102, Sec. 9.3]. We determine the waveforms 𝑦𝑖, 𝑖 = 2,3, randomly by setting 𝑐𝑖,(π‘˜ , 𝑗) to be zero with probability 1/2 or to be a random sample fromN (0,1/π‘˜4)with probability 1/2, forπ‘˜ ∈ {2, . . . ,7}

and 𝑗 ∈ {𝑐, 𝑠}. The waveforms𝑦1, 𝑦2, 𝑦3thus obtained are illustrated in Figure5.2.

The modes𝑣1, 𝑣2, 𝑣3, their amplitudes and instantaneous frequencies are shown in Figure5.1.

Mode k𝑣𝑖 , 𝑒kπ‘£βˆ’π‘£π‘–k𝐿2

𝑖k𝐿2

k𝑣𝑖 , π‘’βˆ’π‘£π‘–k𝐿∞

k𝑣𝑖k𝐿∞

kπ‘Žπ‘– , π‘’βˆ’π‘Žπ‘–k𝐿2

kπ‘Žπ‘–k𝐿2 kπœƒπ‘–,π‘’βˆ’πœƒπ‘–k𝐿2

k𝑦𝑖 , π‘’βˆ’π‘¦π‘–k𝐿2 k𝑦𝑖k𝐿2

𝑖 =1 6.31Γ—10βˆ’3 2.39Γ—10βˆ’2 9.69Γ—10βˆ’5 1.41Γ—10βˆ’5 6.32Γ—10βˆ’3 𝑖 =2 3.83Γ—10βˆ’4 1.08Γ—10βˆ’3 5.75Γ—10βˆ’5 1.16Γ—10βˆ’4 3.76Γ—10βˆ’4 𝑖 =3 3.94Γ—10βˆ’4 1.46Γ—10βˆ’3 9.53Γ—10βˆ’5 6.77Γ—10βˆ’5 3.80Γ—10βˆ’4 Table 5.1: Signal component recovery errors over[βˆ’1,1]when the base waveforms are unknown

We use the same mesh and the same value of 𝛼 values as in Section 4.4. The main source of error for the recovery of the first mode’s base waveform stems from the fact that a triangle wave has an infinite number of overtones, while in our implementation, we estimate only the first 15 overtones. Indeed, the 𝐿2 recovery error of approximating the first 16 tones of the triangle wave is 3.57Γ—10βˆ’4, while the full recovery errors are presented in Table5.1. We omitted the plots of the 𝑦𝑖,𝑒 as they are visually indistinguishable from those of the𝑦𝑖. Note that errors are only slightly improved away from the borders as the majority of it is accounted for by the waveform recovery error.

5.4 Further work in kernel mode decomposition

Micro-local kernel mode decomposition is also shown to produce mode decom- position in cases with modes with crossing frequencies or vanishing amplitudes and noisy observations. This setting is summarized by Problem 10. Further, an illustrative example is given in Example5.4.1and Figure5.5

Problem 10. Forπ‘š ∈Nβˆ—, letπ‘Ž1, . . . , π‘Žπ‘š be piecewise smooth functions on [βˆ’1,1], and let πœƒ1, . . . , πœƒπ‘š be strictly increasing functions on [βˆ’1,1] such that, forπœ– > 0 and 𝛿 ∈ [0,1), the length of 𝑑 with πœƒΒ€π‘–(𝑑)/ Β€πœƒπ‘—(𝑑) ∈ [1βˆ’ πœ– ,1+ πœ–] is less than 𝛿. Assume that π‘š and the π‘Žπ‘–, πœƒπ‘– are unknown, and the square-integrable2πœ‹-periodic base waveform𝑦is known. Given the observation𝑣(𝑑) =Γπ‘š

𝑖=1π‘Žπ‘–(𝑑)𝑦 πœƒπ‘–(𝑑)

+π‘£πœŽ(𝑑) (for𝑑 ∈ [βˆ’1,1]), whereπ‘£πœŽ is a realization of white noise with variance𝜎2, recover the modes𝑣𝑖(𝑑):=π‘Žπ‘–(𝑑)𝑦 πœƒπ‘–(𝑑)

.

Example 5.4.1. Consider the problem of recovering the modes of the signal 𝑣 = 𝑣1+𝑣2+𝑣3+π‘£πœŽshown in Figure5.5. Each mode has a triangular base waveform. In this example𝑣3has the highest frequency and its amplitude vanishes over𝑑 >βˆ’0.25.

The frequencies of 𝑣1 and 𝑣2, cross around 𝑑 = 0.25. π‘£πœŽ ∼ N (0, 𝜎2𝛿(𝑑 βˆ’ 𝑠)) is white noise with standard deviation 𝜎 = 0.5. While the signal-to-noise ratio is Var(𝑣1 +𝑣2 + 𝑣3)/Var(π‘£πœŽ) = 13.1, the SNR ratio against each of the modes Var(𝑣𝑖)/Var(π‘£πœŽ),𝑖 =1,2,3,is2.7,7.7, and10.7, respectively.

Figure 5.5: [102, Fig. 34], (1) signal 𝑣 (2) instantaneous frequencies πœ”π‘– := πœƒΒ€π‘– (3) amplitudesπ‘Žπ‘–(4, 5, 6) modes𝑣1,𝑣2,𝑣3.

On a high level, the problem is approached by iterating the following process.

During the life of the algorithm, the setsVandVsegare maintained, containing the identified full modes and mode segments (which will be defined below), respectively.

First, we identify the lowest instantaneous frequency,πœ”low(𝜏) at each𝜏 ∈ [βˆ’1,1]. Due to mode instantaneous frequency crossings and amplitude vanishings, this could correspond to multiple modes. We also determine the continuity ofπœƒlow(𝜏) andπœ”low(𝜏), and cut the domain at discontinuities. This leads tomode fragments (which are supported in between the domain cuts), which correspond to modes potentially only identified over a subset of domain [βˆ’1,1]. It is checked whether each mode fragment can be extended with continuousπœƒloworπœ”low and is extended if possible, leading tomode segments. Then, the user then has the option whether to disregard, join, or pass on segments to the next iteration. The sets V and Vseg are updated accordingly. The identified modes inV are refined with a micro-local KMD loop. Finally, the identified modes and mode segments are peeled from the signal and we iterate the algorithm with the peeled signal. Further details, examples, and results are presented in [102, Sec. 10].

C h a p t e r 6

KERNEL FLOWS

As introduced in [103], the Kernel Flow (KF) algorithm is a method for kernel selection/design in kriging/Gaussian Process Regression (GPR). It operates on the principle that a kernel is a good model of data if it is able to accurately make predictions on one subset of the data by observing another subset. We consider the supervised learning problem that approximates an unknown function𝑒mappingΞ© to Rbased on the input/output dataset (𝑑𝑖, 𝑦𝑖)1≀𝑖≀𝑁 (where 𝑒(𝑑𝑖) = 𝑦𝑖). We define the vectors of input and output data as D = (𝑑𝑖)𝑖 ∈ Ω𝑁 andY = (𝑦𝑖)𝑖 ∈ R𝑁. Any non-degenerate kernel π‘˜(𝑑 , 𝑑0)can be used to approximate𝑒with the interpolant

𝑒†(𝑑) =k(𝑑 ,D)Kβˆ’1Y, (6.0.1) writingk(𝑑 ,D)= (π‘˜(𝑑 , 𝑑𝑖))𝑖 ∈R1×𝑁,K=(π‘˜(𝑑𝑖, 𝑑𝑗))𝑖, 𝑗 ∈R𝑁×𝑁. The kernel selection problem concerns the identification of a good kernel for performing this interpolation for a particular dataset. The KF algorithm’s approach to this problem is to use the loss of accuracy incurred by removing half of the dataset as a loss of kernel selection. We will present a pair of variants of the KF algorithm,parametricandnon-parametric, beginning with the former [103, Sec. 4], which is outlined in Algorithm5.

6.1 Parametric KF Algorithm Algorithm 5Parametric KF Algorithm

1: Input: dataset (𝑑𝑖, 𝑦𝑖)1≀𝑖≀𝑁, kernel familyπ‘˜πœ½, initial parameter𝜽0

2: 𝜽 β†πœ½0

3: repeat

4: Randomly select{𝑠𝑓(1), . . . 𝑠𝑓(𝑁𝑓)}from{1, . . . , 𝑁}without replacement.

5: Df ← (𝑑𝑠𝑓(𝑖))1≀𝑖≀𝑁𝑐 andYf ← (𝑦𝑠𝑓(𝑖))1≀𝑖≀𝑁𝑐.

6: Randomly select {𝑠𝑐(1), . . . 𝑠𝑐(𝑁𝑐)} from {𝑠𝑓(1), . . . 𝑠𝑓(𝑁𝑓)} without re- placement.

7: Dc ← (𝑑𝑠𝑐(𝑖))1≀𝑖≀𝑁𝑐 andYc ← (𝑦𝑠𝑐(𝑖))1≀𝑖≀𝑁𝑐.

8: 𝜽 ← πœ½βˆ’πœ–βˆ‡πœ½πœŒ(𝜽,Xf,Yf,Xc,Yc)

9: untilEnd criterion

10: Return optimized kernel parameter πœ½βˆ—β† 𝜽

As inputs, in step 1, the parametric variant of the KF algorithm takes a training dataset(𝑑𝑖, 𝑦𝑖)𝑖, a parametric family of kernelsπ‘˜πœ½, and a initial parameter𝜽0. Kernel parameter 𝜽 is first initialized to 𝜽0 in step 2. The main iterative process follows between steps3and9, beginning with the random selection of size 𝑁𝑓 subvectors Df andYf ofDandY(through uniform sampling without replacement in the index set{1, . . . , 𝑁}) as in steps4and5. This selection is randomly sampled further with length 𝑁𝑐 subvectorsDc andYc ofDf andYf (by selecting, at random, uniformly and without replacement,𝑁𝑐 of the indices defining𝐷𝑓) in steps6and7.

Finally in step 8, we update the kernel parameter𝜽 according to gradient descent with loss 𝜌. We define𝜌(𝜽,Df,Yf,Dc,Yc) to be the squared relative error (in the RKHS norm1 k Β· kπ‘˜πœ½ defined byπ‘˜πœ½) between the interpolants𝑒†, 𝑓 and𝑒†,𝑐 obtained from the two nested subsets of the dataset and the kernelπ‘˜πœ½, i.e.2

𝜌(𝜽,Xf,Yf,Xc,Yc):=

k𝑒†, 𝑓 βˆ’π‘’β€ ,𝑐k2

π‘˜πœ½

k𝑒†, 𝑓k2

π‘˜πœ½

=1βˆ’ Yc,>k𝜽(Xc,Xc)βˆ’1Yc

Yf,>k𝜽(Xf,Xf)βˆ’1Yf

. (6.1.1) Note that loss𝜌is doubly randomized through the selection of batches(Df,Yf)and sub-batches (Dc,Yc). The gradient of 𝜌 with respect to 𝜽 is then computed and 𝜽 is updated 𝜽 ← 𝜽 βˆ’ π›Ώβˆ‡πœ½πœŒ. This process is iterated until an ending condition is satisfied, such as the number of iteration steps or 𝜌 < 𝜌stop. The optimized kernel parameterπœ½βˆ—is returned. The corresponding kernel,π‘˜πœ½βˆ—, can then be used to interpolate testing points. This algorithm is a stochastic gradient descent algorithm of𝜌, hence the KF algorithm is a stochastic gradient descent algorithm.

The 𝑙2-norm variant. In the application of the KF algorithm to NN, we will consider the 𝑙2-norm variant of this algorithm (introduced in [103, Sec. 10]) in which the instantaneous loss 𝜌 in (6.1.1) is replaced by the error (let k Β· k2 be the Euclidean𝑙2norm)𝑒2:= kYf βˆ’π‘’β€ ,𝑐(Df) k2

2 of𝑒†,𝑐 in predicting the labelsYf, i.e.

𝑒2(𝜽,Df,Yf,Dc,Yc) :=kYf βˆ’k𝜽(Df,Dc)k𝜽(Dc,Dc)βˆ’1Yck22. (6.1.2) A simple PDE model

To motivate, illustrate and study the parametric KF algorithm, it is useful to start with an application [103, Sec. 5] to the following simple PDE model amenable to

1Note that convergence in RKHS norm implies pointwise convergence.

2Where𝑒†, 𝑓(𝑑) =k𝜽(𝑑 ,Xf)k𝜽(Xf,Xf)βˆ’1Yf and𝑒†, 𝑐(𝑑) =k𝜽(𝑑 ,Xc)πΎπœƒ(Xc,Xc)βˆ’1Yc. Further, 𝜌admits the representation on the left-hand side of equation (6.1.1) enabling its computation [103, Prop. 3.1].

detailed analysis [101]. Let𝑒be the solution of second order elliptic PDE





ο£²



ο£³

βˆ’div π‘Ž(π‘₯)βˆ‡π‘’(π‘₯)

= 𝑓(π‘₯) π‘₯ ∈Ω;

𝑒=0 on πœ•Ξ©,

(6.1.3)

with interval Ξ© βŠ‚ R1 and uniformly elliptic symmetric matrix π‘Ž with entries in 𝐿∞(Ξ©). We write L := βˆ’div(π‘Žβˆ‡Β·) for the corresponding linear bijection from H1

0(Ξ©) to Hβˆ’1(Ξ©). In this proposed simple application we seek to both esti- mate conductivity coefficient π‘Ž and recover the solution of (6.1.3) from the data (π‘₯𝑖, 𝑦𝑖)1≀𝑖≀𝑁 and the information 𝑒(π‘₯𝑖) = 𝑦𝑖. In this example, we use kernel fam- ily, {𝐺𝑏|𝑏 ∈ L∞(Ξ©),essinfΞ©(𝑏) > 0}, where each 𝐺𝑏 is the Green’s function of operatorβˆ’div(π‘βˆ‡Β·). It is known that kernel recovery withπΊπ‘Ž is minimax optimal in k Β· knorm [101]. In what follows, we will numerically demonstrate that the KF algorithm applied to this problem recovers a kernelπΊπ‘βˆ— such thatπ‘Ž β‰ˆ π‘βˆ—.

Figure 6.1: [103, Fig. 5], (1)π‘Ž (2) 𝑓 (3)𝑒 (4) 𝜌(π‘Ž) and 𝜌(𝑏) (where 𝑏 ≑ 1) vs π‘˜ (5)𝑒(π‘Ž)and𝑒(𝑏) vsπ‘˜ (6) 20 random realizations of𝜌(π‘Ž) and𝜌(𝑏)(7) 20 random realizations of𝑒(π‘Ž)and𝑒(𝑏).

Fig. 6.1 provides a numerical illustration of setting of our example, where Ξ© is discretized over 28 equally spaced interior points (and piecewise linear tent finite elements) and Fig.6.1.1-3 showsπ‘Ž, 𝑓 and𝑒. For π‘˜ ∈ {1, . . . ,8} and𝑖 ∈ I(π‘˜) :=

{1, . . . ,2π‘˜ βˆ’ 1} let π‘₯(

π‘˜)

𝑖 = 𝑖/2π‘˜ and write 𝑣(

π‘˜)

𝑏 for the interpolation of the data (π‘₯(

π‘˜) 𝑖 , 𝑒(π‘₯(

π‘˜)

𝑖 ))π‘–βˆˆI(π‘˜) using the kernel𝐺𝑏(note that𝑣(8)

𝑏 =𝑒). Letk𝑣k𝑏be the energy norm k𝑣k2

𝑏 = ∫

Ξ©(βˆ‡π‘£)π‘‡π‘βˆ‡π‘£. Take 𝑏 ≑ 1. Fig. 6.1.4 shows (in semilog scale) the values of 𝜌(π‘Ž) = k𝑣

(π‘˜) π‘Ž βˆ’π‘£π‘Ž(8)kπ‘Ž2

k𝑣(π‘Ž8)k2π‘Ž

and 𝜌(𝑏) = k𝑣

(π‘˜) 𝑏 βˆ’π‘£(8)

𝑏 k2

𝑏

k𝑣(8)

𝑏 k2

𝑏

vs π‘˜. Note that the value of

ratio𝜌is much smaller when the kernelπΊπ‘Žis used for the interpolation of the data.

The geometric decay 𝜌(π‘Ž) ≀𝐢2βˆ’2π‘˜k𝑓k𝐿2(Ξ©)

k𝑒kπ‘Ž2

is well known and has been extensively studied in Numerical Homogenization [101].

Fig.6.1.5 shows (in semilog scale) the values of the prediction errors𝑒(π‘Ž)and𝑒(𝑏) (vsπ‘˜) defined (after normalization) to be proportional tok𝑣(π‘˜)

π‘Ž (π‘₯) βˆ’π‘’(π‘₯) k𝐿2(Ξ©) and k𝑣(

π‘˜)

𝑏 (π‘₯) βˆ’π‘’(π‘₯) k𝐿2(Ξ©). Note again that the prediction error is much smaller when the kernelπΊπ‘Žis used for the interpolation.

Now, let us consider the case where the interpolation points form a random subset of the discretization points. Take𝑁𝑓 =27and𝑁𝑐 =26. Let{π‘₯1, . . . , π‘₯𝑁

𝑓}be a subset with 𝑁𝑓 distinct points of (the discretization points) {𝑖/28|𝑖 ∈ I(8)} sampled with uniform distribution. Let{𝑧1, . . . , 𝑧𝑁

𝑐}be a subset of𝑁𝑐distinct points of𝑋sampled with uniform distribution. Write𝑣𝑓

𝑏for the interpolation of the data(π‘₯𝑖, 𝑒(π‘₯𝑖))using the kernel 𝐺𝑏 and write 𝑣𝑐

𝑏 for the interpolation of the data (𝑧𝑖, 𝑒(𝑧𝑖)) using the kernel𝐺𝑏. Fig.6.1.6 shows in (semilog scale) 20 independent random realizations3 of the values of𝜌(π‘Ž) =k𝑣

𝑓 π‘Žβˆ’π‘£π‘

π‘Žk2π‘Ž/k𝑣

𝑓

π‘Žkπ‘Ž2and𝜌(𝑏) = k𝑣

𝑓 π‘βˆ’π‘£π‘

𝑏k2

𝑏/k𝑣

𝑓 𝑏k2

𝑏. Fig.6.1.7 shows in (semilog scale) 20 independent random realizations of the values of the prediction errors 𝑒(π‘Ž) ∝ k𝑒 βˆ’ 𝑣𝑐

π‘Žk𝐿2(Ξ©) and 𝑒(𝑏) ∝ k𝑒 βˆ’ 𝑣𝑐

𝑏k𝐿2(Ξ©). Note again that the values of 𝜌(π‘Ž), 𝑒(π‘Ž) are consistently and significantly lower than those of 𝜌(𝑏), 𝑒(𝑏).

Figure 6.2: [103, Fig. 6], (1)π‘Žand𝑏 for𝑛=1 (2)π‘Žand𝑏 for𝑛=350 (2)𝜌(𝑏)vs 𝑛(4)𝑒(𝑏)vs𝑛.

Fig. 6.2 provides a numerical illustration of an implementation of Alg. 5 with 𝑁 = 𝑁𝑓 = 27, 𝑁𝑐 = 26, and𝑛indexing each iteration of the KF algorithm starting with𝑛=1. In this implementationπ‘Ž, 𝑓 and𝑒are as in Fig.6.1.1-3. The training data corresponds to𝑁points𝑋 ={π‘₯1, . . . , π‘₯𝑁}uniformly sampled (without replacement) from{𝑖/28|𝑖 ∈ I(8)}. Note that𝑋 remains fixed and since𝑁 =𝑁𝑓, the larger batch

3Random realizations of the subsets.

(as in step4in Alg. 5) is always selected as𝑋. The purpose of the algorithm is to learn the kernel πΊπ‘Ž in the set of kernels {𝐺𝑏

π‘Š|π‘Š} parameterized by the vectorπ‘Š via

logπ‘π‘Š =

26

Γ•

𝑖=1

(π‘Šπ‘

𝑖 cos(2πœ‹π‘–π‘₯) +π‘Šπ‘ 

𝑖 sin(2πœ‹π‘–π‘₯)). (6.1.4) Using𝑛to label its progression, Alg.5is initialized at𝑛 =1 with the guess𝑏0≑ 1 (i.e.,π‘Š0 ≑0) (Fig.6.2.1). Fig.6.2.2 shows the value of𝑏after𝑛=350 iterations and can be seen to approximateπ‘Ž. Fig. 6.2.3 shows the value of 𝜌(𝑏) vs𝑛. Fig. 6.2.4 shows the value of the prediction error 𝑒(𝑏) ∝ kπ‘’βˆ’ 𝑣𝑐

𝑏k𝐿2(Ξ©) vs 𝑛. The lack of smoothness of the plots of 𝜌(𝑏), 𝑒(𝑏) vs 𝑛 originate from the re-sampling of the set 𝑍 at each step𝑛. Further details of this application of the KF algorithm can be found in [103, Sec. 5].

6.2 Non-parametric kernel flows

Recall that the parametric variant of the KF algorithm utilizes a parameterized family of kernelsπ‘˜πœ½ and optimizes interpolation accuracy,𝜌, with respect to𝜽. The interpolation returned by the algorithm is thenπ‘˜πœ½βˆ— whereπœ½βˆ—is the optimized kernel parameter. In contrast, the non-parametric version [103, Sec. 6] is initialized with kernelπ‘˜ and learns kernel of the formπ‘˜πΉ(𝑑 , 𝑑0) =π‘˜(𝐹(𝑑), 𝐹(𝑑0))where𝐹 :Ξ©β†’ Ξ© is an arbitrary function. In this case,

𝜌(𝐹 ,Df,Yf,Dc,Yc):=1βˆ’ Yc,>k𝐹(Xc,Xc)βˆ’1Yc

Yf,>k𝐹(Xf,Xf)βˆ’1Yf (6.2.1) is optimized with respect to𝐹. In practice, this is done by learning the𝜌minimizing optimal deformations to training inputs in Df and using kernel interpolation. This leads to a deformation 𝐺𝑛, which is used to update 𝐹 at each iteration according to (𝐼 + πœ– 𝐺𝑛) β—¦ 𝐹 where 𝐼 is the identity function. Further mathematical detail can be found in [103, Sec. 6]. We will next present numerical examples of the non-parametric KF algorithm.

The Swiss Roll cheesecake example

We examine the application of the KF algorithm to the Swiss Roll cheesecake [103, Sec. 7], as illustrated in Figure6.3.1. The dataset inputs lie inΞ© =R2in the shape of two concentric spirals. Red points have labelβˆ’1 and blue points have label 1. The purpose of this exposition is to illustrate the flow and deformations in each point in dataset. Hence, to do so, all datapoints will be considered as training points and we will not introduce a testing dataset. We select𝑁𝑓 =𝑁 and𝑁𝑐 =𝑁𝑓/2 and initialize

Dokumen terkait