Chapter IV: Iterated micro-local kernel mode decomposition for known base
4.4 Numerical experiments
filter to locally smooth (denoise) the curves corresponding to each estimate ππ,π (which corresponds to refining our phase estimates through GPR filtering) (2) start- ing with a largerπΌ(to decrease interference from other modes/overtones) and slowly reducing its value in the optional final refinement loop (to further localize our esti- mates after other components, and hence interference, have been mostly eliminated).
Mode kπ£π , πkπ£βπ£πkπΏ2
πkπΏ2
kπ£π , πβπ£πkπΏβ
kπ£πkπΏβ
kππ , πβππkπΏ2
kππkπΏ2 kππ,πβππkπΏ2
π =1 1.00Γ10β8 2.40Γ10β8 7.08Γ10β9 6.52Γ10β9 π =2 2.74Γ10β7 2.55Γ10β7 1.87Γ10β8 2.43Γ10β7 π =3 2.37Γ10β7 3.67Γ10β7 1.48Γ10β7 1.48Γ10β7 Table 4.2: Signal component recovery errors in the triangle base waveform example over[β1
3, 1
3].
EKG wave example
The base waveform is the EKG wave displayed in Figure 4.1. We use the same discrete mesh as in the triangle case. Here, we tookπΌ=25 in the loop corresponding to steps1to19and slowly decreased it to 15 in the final loop corresponding to steps 22 to 28. The amplitudes and frequencies of each of the modes are shown in Figure4.3, while the recovery error of each mode as well as their amplitude and phase functions are shown both over the whole interval[β1,1]and the interior third [β1
3, 1
3] in Tables4.3and4.4, respectively. Within the interior third of the interval, amplitude and phase relative errors are found to be on the order of 10β4 to 10β5 in this setting. However, over [β1,1], the mean errors are more substantial, with amplitude and phase estimates in the 10β1to 10β3range. Note the high error rates in πΏβstemming from errors in placement of the tallest peak (the region around which is known as the R wave in the EKG community). In the center third of the interval, π£π,πandπ£πare visually indistinguishable due to the small recovery errors.
Mode kπ£π , πkπ£βπ£πkπΏ2
πkπΏ2
kπ£π , πβπ£πkπΏβ
kπ£πkπΏβ
kππ , πβππkπΏ2
kππkπΏ2 kππ,πβππkπΏ2
π =1 5.66Γ10β2 1.45Γ10β1 4.96Γ10β3 8.43Γ10β3 π =2 4.61Γ10β2 2.39Γ10β1 2.35Γ10β2 1.15Γ10β2 π =3 1.34Γ10β1 9.39Γ10β1 9.31Γ10β3 2.69Γ10β2 Table 4.3: Signal component recovery errors on [β1,1]in the EKG base waveform example.
Mode kπ£π , πkπ£βπ£πkπΏ2
πkπΏ2
kπ£π , πβπ£πkπΏβ
kπ£πkπΏβ
kππ , πβππkπΏ2
kππkπΏ2 kππ,πβππkπΏ2
π =1 1.80Γ10β4 3.32Γ10β4 3.52Γ10β5 2.85Γ10β5 π =2 4.35Γ10β4 5.09Γ10β4 3.35Γ10β5 7.18Γ10β5 π =3 3.63Γ10β4 1.08Γ10β3 7.23Γ10β5 6.26Γ10β5 Table 4.4: Signal component recovery errors on[β1
3, 1
3]in the EKG base waveform example.
C h a p t e r 5
ITERATED MICRO-LOCAL KERNEL MODE
DECOMPOSITION FOR UNKNOWN BASE WAVEFORMS
We continue our discussion of KMD techniques by examining its application to an extension of original mode recovery problem, Problem7. We generalize the problem to the case where base waveforms of each mode are unknown [102, Sec. 9] and is formally stated below in Problem 9. Previously, in Section 4, we discussed how GPR can be applied to learn the instantaneous amplitudes and phases of each mode.
In the context of the unknown waveform problem, we will introduce micro-local waveform KMD in Section 5.1, which again utilizes GPR and is able to estimate waveforms of modes.
Problem 9. For π βNβ, letπ1, . . . , ππ be piecewise smooth functions on [β1,1], letπ1, . . . , ππ be piecewise smooth functions on[β1,1]such that the instantaneous frequenciesπΒ€πare strictly positive and well separated, and letπ¦1, . . . , π¦π be square- integrable2π-periodic functions. Assume thatπand theππ, ππ, π¦π are all unknown.
Given the observation π£(π‘)=
π
Γ
π=1
ππ(π‘)π¦π ππ(π‘)
, π‘ β [β1,1], (5.0.1) recover the modesπ£π(π‘) :=ππ(π‘)π¦π ππ(π‘)
.
To avoid ambiguities caused by overtones with the unknown waveformsπ¦π, we will assume that the corresponding functions(ππΒ€π)π‘β[β1,1] and (π0πΒ€π0)π‘β[β1,1] are distinct forπ β π0andπ , π0 βNβ, that is, they may be equal for someπ‘ but not for allπ‘. We represent theπ-th base waveform π¦πthrough its Fourier series
π¦π(π‘)=cos(π‘) +
πmax
Γ
π=2
ππ,(π ,π)cos(π π‘) +ππ,(π ,π )sin(π π‘)
, (5.0.2)
that, without loss of generality, has been scaled and translated. Moreover, since we operate in a discrete setting, we also truncate the series at a finite levelπmax, which is naturally bounded by the inverse of the resolution of the discretization in time. To
Figure 5.1: [102, Fig. 30], (1) signal π£ (the signal is defined over [β1,1] but displayed over [0,0.4] for visibility) (2) instantaneous frequencies ππ := πΒ€π (3) amplitudes ππ (4, 5, 6) Modes π£1, π£2, π£3 over [0,0.4] (mode plots have also been zoomed in for visibility).
Figure 5.2: [102, Fig. 31], illustrations showing (1)π¦1(2)π¦2(3)π¦3.
illustrate our approach, we consider the signalπ£ =π£1+π£1+π£3and its corresponding modes π£π := ππ(π‘)π¦π ππ(π‘)
displayed in Figure5.1, where the corresponding base waveformsπ¦1, π¦2andπ¦3are shown in Figure5.2and described in Section5.3.
5.1 Micro-local waveform KMD
We are now describing the micro-localwaveformKMD [102, Sec. 9.1], Algorithm 4, which takes as inputs a time π, estimated instantaneous amplitude and phase functionsπ‘ β π(π‘), π(π‘), and a signalπ£, and outputs an estimate of the waveform π¦(π‘) associated with the phase function π. The proposed approach is a direct extension of the one presented in Section 4.2 and the shaded part of Figure 5.3 shows the new block which will be added to Algorithm3, the algorithm designed
for the case when waveforms are non-trigonometric and known. As described below this new block produces an estimator π¦π,π of the waveform π¦π from an estimate ππ,π of the phaseππ.
Figure 5.3: [102, Fig. 32], high level structure of Algorithm4for the case when the waveforms are unknown.
GivenπΌ > 0,πβ [β1,1], and differentiable functionπ‘ βπ(π‘), define the Gaussian process
π
π¦
π,π(π‘) =πβ
ππΒ€ (π) (π‘βπ) πΌ
2 π
π¦
1,πcos π(π‘) +
πmax
Γ
π=2
π
π¦
π ,πcos π π(π‘) +π
π¦
π ,π sin π π(π‘) , (5.1.1) whereπ
π¦ 1,π, π
π¦
π ,π, andπ
π¦
π ,π are independentN (0,1) random variables. Let π£π(π‘) :=πβ
ππ(Β€ π) (π‘βπ) πΌ
2
π£(π‘), π β [β1,1], (5.1.2) be the windowed signal, and define
π
π¦
π , π(π, π , π£) :=lim
πβ0E π
π¦ π , π
π
π¦
π,π +ππ =π£π
, (5.1.3)
and, forπ β {2, . . . , πmax}, π β {π, π }, let ππ , π(π, π , π£) :=
π
π¦
π , π(π, π , π£) π
π¦
1,π(π, π , π£)
. (5.1.4)
When the assumed phase function π :=ππ,π is close to the phase functionππ of the π-th mode of the signalπ£in the expansion (5.0.1),ππ , π(π, ππ,π, π£)yields an estimate of the Fourier coefficientππ,(π , π) (5.0.2) of theπ-th base waveformπ¦π at timeπ‘ =π. This waveform recovery is susceptible to error when there is interference in the overtone frequencies (that is for the values ofπat which π1πΒ€π
1 β π2πΒ€π
2 forπ1 < π2).
However, since the coefficientππ,(π , π) is independent of time, we can overcome this by computingππ , π(π, ππ,π, π£)at each timeπand take the most common approximate
value as follows. Letπ β [β1,1] be the finite set of values of πin the numerical discretization of the time axis withπ :=|π|elements. For interval πΌ β R,
ππΌ :={π βπ|ππ , π(π, ππ,π, π£) β πΌ}, (5.1.5) and let ππΌ :=|ππΌ|denote the number of elements ofππΌ. Let πΌmaxbe a maximizer of the functionπΌ β ππΌ over intervals of fixed width πΏ, and define the estimate
ππ , π(ππ,π, π£) :=


ο£²

ο£³
1 ππΌmax
Γ
πβππΌmax
ππ , π(π, ππ,π, π£) ,
ππΌmax
π β₯ 0.05
0 ,
ππΌ
max
π < 0.05
, (5.1.6) of the Fourier coefficient ππ,(π , π) to be the average of the values of ππ , π(π, ππ,π, π£) overπ β ππΌ
max. The interpretation of the selection of the cutoff 0.05 is as follows:
if ππΌπmax is small then there is interference in the overtones at all time [β1,1] and no information may be obtained about the corresponding Fourier coefficient. When the assumed phase function is near that of the lowest frequency modeπ£1, which we write π := π1,π, Figures5.4.2 and 4 shows zoomed-in histograms of the functions πβ π(3,π)(π, π1,π, π£)andπβπ(3,π )(π, π1,π, π£) displayed in Figures5.4.1 and 3.
Figure 5.4: [102, Fig. 33], (1) a plot of the function π β π(3,π)(π, π1,π, π£) (2) a histogram (cropping outliers) with bin width 0.002 ofπ(3,π)(π, π1,π, π£) values. The true value π1,(3,π) is 1/9 since π¦1 is a triangle wave. (3) a plot of the function π β π(3,π )(π, π1,π, π£) (2) a histogram (cropping outliers) with bin width 0.002 of π(3,π )(π, π1,π, π£) values. The true valueπ1,(3,π ) of this overtone is 0.
On the interval widthπΏ. In our numerical experiments, the recovered modes and waveforms show little sensitivity to the choice of πΏ. In particular, we set πΏ to be 0.002, whereas widths between 0.001 and 0.01 yield similar results. The rationale for the rough selection of the value of πΏ is as follows. Suppose π£ = cos(ππ‘) and π£0=π£+cos(1.5ππ‘). Define the quantity
max
π
π2,π(π, π , π£0) βπ2,π(π, π , π£)
, (5.1.7)
with the intuition of approximating the maximum corruption by the cos(1.5ππ‘)term in the estimated first overtone. This quantity provides a good choice for πΏ and is mainly dependent on the selection ofπΌand marginally onπ. For our selection of πΌ=10, we numerically found its value to be approximately 0.002.
5.2 Iterated micro-local KMD with unknown waveforms algorithm Algorithm 4Iterated micro-local KMD with unknown waveforms.
1: π β1 andπ£(1) βπ£
2: whiletruedo
3: if πlow(π£(π)) =β then
4: break loop
5: else
6: ππ,π β πlow(π£(π))
7: π¦π,π βcos(π‘)
8: end if
9: ππ,π(π‘) β0
10: repeat
11: forπin{1, ..., π}do
12: π£π ,res β π£βππ ,ππ¦Β―π ,π(ππ ,π) βΓ
πβ π , πβ€πππ ,ππ¦π ,π(ππ ,π)
13: ππ ,π(π) β π π, ππ ,π, π£π ,res /π1
14: ππ ,π(π) β ππ ,π(π) + 1
2πΏπ π, ππ ,π, π£π ,res
15: ππ ,(π , π),π βππ , π ππ ,π, π£π ,res
16: π¦π ,π(π‘) βcos(π‘) +Γπmax
π=2 ππ ,(π ,π),πcos(π π‘) +ππ ,(π ,π ),πsin(π π‘)
17: end for
18: untilsupπ ,π
πΏπ π, ππ ,π, π£π ,res < π1
19: π£(π+1) β π£βΓ
πβ€πππ ,ππ¦π,π(ππ ,π)
20: πβπ+1
21: end while
22: πβπβ1
23: if refine_final=True then
24: repeat
25: forπin{1, . . . , π}do
26: π£π,res β π£βππ,ππ¦Β―π,π(ππ,π) βΓ
πβ πππ ,ππ¦π ,π(ππ ,π)
27: ππ,π(π) βπ π, ππ,π, π£π,res
28: ππ,π(π) βππ,π(π) + 1
2πΏπ π, ππ,π, π£π,res
29: ππ,(π , π),π β ππ , π ππ,π, π£βΓ
πβ πππ ,ππ¦π ,π(ππ ,π)
30: π¦π,π(π‘) βcos(π‘) +Γπmax
π=2 ππ,(π ,π),πcos(π π‘) +ππ,(π ,π ),πsin(π π‘)
31: end for
32: untilsupπ,π
πΏπ π, ππ,π, π£π,res < π2
33: end if
34: Return the modes π£π,π βππ,π(π‘)π¦(ππ,π(π‘))forπ=1, ..., π
Except for the steps discussed in Section5.1, Algorithm4[102, Sec. 9.2] is identical to Algorithm3. As illustrated in Figure 5.3, we first identify the lowest frequency of the cosine component of each mode (steps6and7in Algorithm4). Next, from steps 10 to 18, we execute a similar refinement loop as in Algorithm 3 with the addition of an application of micro-local waveform KMD on steps 15 and 16 to estimate base waveforms. Finally, once each mode has been identified, we again apply waveform estimation in steps29-30(after nearly eliminating other modes and reducing interference in overtones for higher accuracies).
5.3 Numerical experiments
To illustrate this learning of the base waveform of each mode, we take π£(π‘) = Γ3
π=1ππ(π‘)π¦π(ππ(π‘)), where the lowest frequency mode π1(π‘)π¦1(π1(π‘)) has the (un- known) triangle waveform π¦1 of Figure 4.1 [102, Sec. 9.3]. We determine the waveforms π¦π, π = 2,3, randomly by setting ππ,(π , π) to be zero with probability 1/2 or to be a random sample fromN (0,1/π4)with probability 1/2, forπ β {2, . . . ,7}
and π β {π, π }. The waveformsπ¦1, π¦2, π¦3thus obtained are illustrated in Figure5.2.
The modesπ£1, π£2, π£3, their amplitudes and instantaneous frequencies are shown in Figure5.1.
Mode kπ£π , πkπ£βπ£πkπΏ2
πkπΏ2
kπ£π , πβπ£πkπΏβ
kπ£πkπΏβ
kππ , πβππkπΏ2
kππkπΏ2 kππ,πβππkπΏ2
kπ¦π , πβπ¦πkπΏ2 kπ¦πkπΏ2
π =1 6.31Γ10β3 2.39Γ10β2 9.69Γ10β5 1.41Γ10β5 6.32Γ10β3 π =2 3.83Γ10β4 1.08Γ10β3 5.75Γ10β5 1.16Γ10β4 3.76Γ10β4 π =3 3.94Γ10β4 1.46Γ10β3 9.53Γ10β5 6.77Γ10β5 3.80Γ10β4 Table 5.1: Signal component recovery errors over[β1,1]when the base waveforms are unknown
We use the same mesh and the same value of πΌ values as in Section 4.4. The main source of error for the recovery of the first modeβs base waveform stems from the fact that a triangle wave has an infinite number of overtones, while in our implementation, we estimate only the first 15 overtones. Indeed, the πΏ2 recovery error of approximating the first 16 tones of the triangle wave is 3.57Γ10β4, while the full recovery errors are presented in Table5.1. We omitted the plots of the π¦π,π as they are visually indistinguishable from those of theπ¦π. Note that errors are only slightly improved away from the borders as the majority of it is accounted for by the waveform recovery error.
5.4 Further work in kernel mode decomposition
Micro-local kernel mode decomposition is also shown to produce mode decom- position in cases with modes with crossing frequencies or vanishing amplitudes and noisy observations. This setting is summarized by Problem 10. Further, an illustrative example is given in Example5.4.1and Figure5.5
Problem 10. Forπ βNβ, letπ1, . . . , ππ be piecewise smooth functions on [β1,1], and let π1, . . . , ππ be strictly increasing functions on [β1,1] such that, forπ > 0 and πΏ β [0,1), the length of π‘ with πΒ€π(π‘)/ Β€ππ(π‘) β [1β π ,1+ π] is less than πΏ. Assume that π and the ππ, ππ are unknown, and the square-integrable2π-periodic base waveformπ¦is known. Given the observationπ£(π‘) =Γπ
π=1ππ(π‘)π¦ ππ(π‘)
+π£π(π‘) (forπ‘ β [β1,1]), whereπ£π is a realization of white noise with varianceπ2, recover the modesπ£π(π‘):=ππ(π‘)π¦ ππ(π‘)
.
Example 5.4.1. Consider the problem of recovering the modes of the signal π£ = π£1+π£2+π£3+π£πshown in Figure5.5. Each mode has a triangular base waveform. In this exampleπ£3has the highest frequency and its amplitude vanishes overπ‘ >β0.25.
The frequencies of π£1 and π£2, cross around π‘ = 0.25. π£π βΌ N (0, π2πΏ(π‘ β π )) is white noise with standard deviation π = 0.5. While the signal-to-noise ratio is Var(π£1 +π£2 + π£3)/Var(π£π) = 13.1, the SNR ratio against each of the modes Var(π£π)/Var(π£π),π =1,2,3,is2.7,7.7, and10.7, respectively.
Figure 5.5: [102, Fig. 34], (1) signal π£ (2) instantaneous frequencies ππ := πΒ€π (3) amplitudesππ(4, 5, 6) modesπ£1,π£2,π£3.
On a high level, the problem is approached by iterating the following process.
During the life of the algorithm, the setsVandVsegare maintained, containing the identified full modes and mode segments (which will be defined below), respectively.
First, we identify the lowest instantaneous frequency,πlow(π) at eachπ β [β1,1]. Due to mode instantaneous frequency crossings and amplitude vanishings, this could correspond to multiple modes. We also determine the continuity ofπlow(π) andπlow(π), and cut the domain at discontinuities. This leads tomode fragments (which are supported in between the domain cuts), which correspond to modes potentially only identified over a subset of domain [β1,1]. It is checked whether each mode fragment can be extended with continuousπloworπlow and is extended if possible, leading tomode segments. Then, the user then has the option whether to disregard, join, or pass on segments to the next iteration. The sets V and Vseg are updated accordingly. The identified modes inV are refined with a micro-local KMD loop. Finally, the identified modes and mode segments are peeled from the signal and we iterate the algorithm with the peeled signal. Further details, examples, and results are presented in [102, Sec. 10].
C h a p t e r 6
KERNEL FLOWS
As introduced in [103], the Kernel Flow (KF) algorithm is a method for kernel selection/design in kriging/Gaussian Process Regression (GPR). It operates on the principle that a kernel is a good model of data if it is able to accurately make predictions on one subset of the data by observing another subset. We consider the supervised learning problem that approximates an unknown functionπ’mappingΞ© to Rbased on the input/output dataset (π‘π, π¦π)1β€πβ€π (where π’(π‘π) = π¦π). We define the vectors of input and output data as D = (π‘π)π β Ξ©π andY = (π¦π)π β Rπ. Any non-degenerate kernel π(π‘ , π‘0)can be used to approximateπ’with the interpolant
π’β (π‘) =k(π‘ ,D)Kβ1Y, (6.0.1) writingk(π‘ ,D)= (π(π‘ , π‘π))π βR1Γπ,K=(π(π‘π, π‘π))π, π βRπΓπ. The kernel selection problem concerns the identification of a good kernel for performing this interpolation for a particular dataset. The KF algorithmβs approach to this problem is to use the loss of accuracy incurred by removing half of the dataset as a loss of kernel selection. We will present a pair of variants of the KF algorithm,parametricandnon-parametric, beginning with the former [103, Sec. 4], which is outlined in Algorithm5.
6.1 Parametric KF Algorithm Algorithm 5Parametric KF Algorithm
1: Input: dataset (π‘π, π¦π)1β€πβ€π, kernel familyππ½, initial parameterπ½0
2: π½ βπ½0
3: repeat
4: Randomly select{π π(1), . . . π π(ππ)}from{1, . . . , π}without replacement.
5: Df β (π‘π π(π))1β€πβ€ππ andYf β (π¦π π(π))1β€πβ€ππ.
6: Randomly select {π π(1), . . . π π(ππ)} from {π π(1), . . . π π(ππ)} without re- placement.
7: Dc β (π‘π π(π))1β€πβ€ππ andYc β (π¦π π(π))1β€πβ€ππ.
8: π½ β π½βπβπ½π(π½,Xf,Yf,Xc,Yc)
9: untilEnd criterion
10: Return optimized kernel parameter π½ββ π½
As inputs, in step 1, the parametric variant of the KF algorithm takes a training dataset(π‘π, π¦π)π, a parametric family of kernelsππ½, and a initial parameterπ½0. Kernel parameter π½ is first initialized to π½0 in step 2. The main iterative process follows between steps3and9, beginning with the random selection of size ππ subvectors Df andYf ofDandY(through uniform sampling without replacement in the index set{1, . . . , π}) as in steps4and5. This selection is randomly sampled further with length ππ subvectorsDc andYc ofDf andYf (by selecting, at random, uniformly and without replacement,ππ of the indices definingπ·π) in steps6and7.
Finally in step 8, we update the kernel parameterπ½ according to gradient descent with loss π. We defineπ(π½,Df,Yf,Dc,Yc) to be the squared relative error (in the RKHS norm1 k Β· kππ½ defined byππ½) between the interpolantsπ’β , π andπ’β ,π obtained from the two nested subsets of the dataset and the kernelππ½, i.e.2
π(π½,Xf,Yf,Xc,Yc):=
kπ’β , π βπ’β ,πk2
ππ½
kπ’β , πk2
ππ½
=1β Yc,>kπ½(Xc,Xc)β1Yc
Yf,>kπ½(Xf,Xf)β1Yf
. (6.1.1) Note that lossπis doubly randomized through the selection of batches(Df,Yf)and sub-batches (Dc,Yc). The gradient of π with respect to π½ is then computed and π½ is updated π½ β π½ β πΏβπ½π. This process is iterated until an ending condition is satisfied, such as the number of iteration steps or π < πstop. The optimized kernel parameterπ½βis returned. The corresponding kernel,ππ½β, can then be used to interpolate testing points. This algorithm is a stochastic gradient descent algorithm ofπ, hence the KF algorithm is a stochastic gradient descent algorithm.
The π2-norm variant. In the application of the KF algorithm to NN, we will consider the π2-norm variant of this algorithm (introduced in [103, Sec. 10]) in which the instantaneous loss π in (6.1.1) is replaced by the error (let k Β· k2 be the Euclideanπ2norm)π2:= kYf βπ’β ,π(Df) k2
2 ofπ’β ,π in predicting the labelsYf, i.e.
π2(π½,Df,Yf,Dc,Yc) :=kYf βkπ½(Df,Dc)kπ½(Dc,Dc)β1Yck22. (6.1.2) A simple PDE model
To motivate, illustrate and study the parametric KF algorithm, it is useful to start with an application [103, Sec. 5] to the following simple PDE model amenable to
1Note that convergence in RKHS norm implies pointwise convergence.
2Whereπ’β , π(π‘) =kπ½(π‘ ,Xf)kπ½(Xf,Xf)β1Yf andπ’β , π(π‘) =kπ½(π‘ ,Xc)πΎπ(Xc,Xc)β1Yc. Further, πadmits the representation on the left-hand side of equation (6.1.1) enabling its computation [103, Prop. 3.1].
detailed analysis [101]. Letπ’be the solution of second order elliptic PDE


ο£²

ο£³
βdiv π(π₯)βπ’(π₯)
= π(π₯) π₯ βΞ©;
π’=0 on πΞ©,
(6.1.3)
with interval Ξ© β R1 and uniformly elliptic symmetric matrix π with entries in πΏβ(Ξ©). We write L := βdiv(πβΒ·) for the corresponding linear bijection from H1
0(Ξ©) to Hβ1(Ξ©). In this proposed simple application we seek to both esti- mate conductivity coefficient π and recover the solution of (6.1.3) from the data (π₯π, π¦π)1β€πβ€π and the information π’(π₯π) = π¦π. In this example, we use kernel fam- ily, {πΊπ|π β Lβ(Ξ©),essinfΞ©(π) > 0}, where each πΊπ is the Greenβs function of operatorβdiv(πβΒ·). It is known that kernel recovery withπΊπ is minimax optimal in k Β· knorm [101]. In what follows, we will numerically demonstrate that the KF algorithm applied to this problem recovers a kernelπΊπβ such thatπ β πβ.
Figure 6.1: [103, Fig. 5], (1)π (2) π (3)π’ (4) π(π) and π(π) (where π β‘ 1) vs π (5)π(π)andπ(π) vsπ (6) 20 random realizations ofπ(π) andπ(π)(7) 20 random realizations ofπ(π)andπ(π).
Fig. 6.1 provides a numerical illustration of setting of our example, where Ξ© is discretized over 28 equally spaced interior points (and piecewise linear tent finite elements) and Fig.6.1.1-3 showsπ, π andπ’. For π β {1, . . . ,8} andπ β I(π) :=
{1, . . . ,2π β 1} let π₯(
π)
π = π/2π and write π£(
π)
π for the interpolation of the data (π₯(
π) π , π’(π₯(
π)
π ))πβI(π) using the kernelπΊπ(note thatπ£(8)
π =π’). Letkπ£kπbe the energy norm kπ£k2
π = β«
Ξ©(βπ£)ππβπ£. Take π β‘ 1. Fig. 6.1.4 shows (in semilog scale) the values of π(π) = kπ£
(π) π βπ£π(8)kπ2
kπ£(π8)k2π
and π(π) = kπ£
(π) π βπ£(8)
π k2
π
kπ£(8)
π k2
π
vs π. Note that the value of
ratioπis much smaller when the kernelπΊπis used for the interpolation of the data.
The geometric decay π(π) β€πΆ2β2πkπkπΏ2(Ξ©)
kπ’kπ2
is well known and has been extensively studied in Numerical Homogenization [101].
Fig.6.1.5 shows (in semilog scale) the values of the prediction errorsπ(π)andπ(π) (vsπ) defined (after normalization) to be proportional tokπ£(π)
π (π₯) βπ’(π₯) kπΏ2(Ξ©) and kπ£(
π)
π (π₯) βπ’(π₯) kπΏ2(Ξ©). Note again that the prediction error is much smaller when the kernelπΊπis used for the interpolation.
Now, let us consider the case where the interpolation points form a random subset of the discretization points. Takeππ =27andππ =26. Let{π₯1, . . . , π₯π
π}be a subset with ππ distinct points of (the discretization points) {π/28|π β I(8)} sampled with uniform distribution. Let{π§1, . . . , π§π
π}be a subset ofππdistinct points ofπsampled with uniform distribution. Writeπ£π
πfor the interpolation of the data(π₯π, π’(π₯π))using the kernel πΊπ and write π£π
π for the interpolation of the data (π§π, π’(π§π)) using the kernelπΊπ. Fig.6.1.6 shows in (semilog scale) 20 independent random realizations3 of the values ofπ(π) =kπ£
π πβπ£π
πk2π/kπ£
π
πkπ2andπ(π) = kπ£
π πβπ£π
πk2
π/kπ£
π πk2
π. Fig.6.1.7 shows in (semilog scale) 20 independent random realizations of the values of the prediction errors π(π) β kπ’ β π£π
πkπΏ2(Ξ©) and π(π) β kπ’ β π£π
πkπΏ2(Ξ©). Note again that the values of π(π), π(π) are consistently and significantly lower than those of π(π), π(π).
Figure 6.2: [103, Fig. 6], (1)πandπ forπ=1 (2)πandπ forπ=350 (2)π(π)vs π(4)π(π)vsπ.
Fig. 6.2 provides a numerical illustration of an implementation of Alg. 5 with π = ππ = 27, ππ = 26, andπindexing each iteration of the KF algorithm starting withπ=1. In this implementationπ, π andπ’are as in Fig.6.1.1-3. The training data corresponds toπpointsπ ={π₯1, . . . , π₯π}uniformly sampled (without replacement) from{π/28|π β I(8)}. Note thatπ remains fixed and sinceπ =ππ, the larger batch
3Random realizations of the subsets.
(as in step4in Alg. 5) is always selected asπ. The purpose of the algorithm is to learn the kernel πΊπ in the set of kernels {πΊπ
π|π} parameterized by the vectorπ via
logππ =
26
Γ
π=1
(ππ
π cos(2πππ₯) +ππ
π sin(2πππ₯)). (6.1.4) Usingπto label its progression, Alg.5is initialized atπ =1 with the guessπ0β‘ 1 (i.e.,π0 β‘0) (Fig.6.2.1). Fig.6.2.2 shows the value ofπafterπ=350 iterations and can be seen to approximateπ. Fig. 6.2.3 shows the value of π(π) vsπ. Fig. 6.2.4 shows the value of the prediction error π(π) β kπ’β π£π
πkπΏ2(Ξ©) vs π. The lack of smoothness of the plots of π(π), π(π) vs π originate from the re-sampling of the set π at each stepπ. Further details of this application of the KF algorithm can be found in [103, Sec. 5].
6.2 Non-parametric kernel flows
Recall that the parametric variant of the KF algorithm utilizes a parameterized family of kernelsππ½ and optimizes interpolation accuracy,π, with respect toπ½. The interpolation returned by the algorithm is thenππ½β whereπ½βis the optimized kernel parameter. In contrast, the non-parametric version [103, Sec. 6] is initialized with kernelπ and learns kernel of the formππΉ(π‘ , π‘0) =π(πΉ(π‘), πΉ(π‘0))whereπΉ :Ξ©β Ξ© is an arbitrary function. In this case,
π(πΉ ,Df,Yf,Dc,Yc):=1β Yc,>kπΉ(Xc,Xc)β1Yc
Yf,>kπΉ(Xf,Xf)β1Yf (6.2.1) is optimized with respect toπΉ. In practice, this is done by learning theπminimizing optimal deformations to training inputs in Df and using kernel interpolation. This leads to a deformation πΊπ, which is used to update πΉ at each iteration according to (πΌ + π πΊπ) β¦ πΉ where πΌ is the identity function. Further mathematical detail can be found in [103, Sec. 6]. We will next present numerical examples of the non-parametric KF algorithm.
The Swiss Roll cheesecake example
We examine the application of the KF algorithm to the Swiss Roll cheesecake [103, Sec. 7], as illustrated in Figure6.3.1. The dataset inputs lie inΞ© =R2in the shape of two concentric spirals. Red points have labelβ1 and blue points have label 1. The purpose of this exposition is to illustrate the flow and deformations in each point in dataset. Hence, to do so, all datapoints will be considered as training points and we will not introduce a testing dataset. We selectππ =π andππ =ππ/2 and initialize