• Tidak ada hasil yang ditemukan

Chapter I: Introduction

1.6 Patterns and kernels

transform described in Theorem2.3.1is found to be within a constant of a minimax optimal recovery. This denoising is interpretable as a Gaussian conditioning with covariance kernel derived fromL. The length-scale of the conditioning is dependent on the signal-to-noise ratio in the assumption of the problem. A conventional method for denoising smooth signals involves thresholding empirical wavelet coefficients.

We numerically compare the near-minimax optimal recovery with thresholding gamblet transform coefficients in Section2.4.

Our approach [102] to the mode decomposition problem, kernel mode decompo- sition (KMD), is outlined in Section 4. There are two main components within the algorithm: The first, which we name max-pooling, estimates the instantaneous phase and frequency of the lowest frequency mode in a signal and is presented in Section 4.1. It is a close variant of the continuous wavelet transform (CWT) [24], which we summarize in Section 3.2. The second component uses GPR to estimate the instantaneous amplitude and phase of this lowest frequency mode and is summarized in Section4.2. We define a GP with a covariance kernel constructed from Gaussian-windowed trigonometric waves, i.e., Gabor wavelets [48]. GPR then is able to estimate the instantaneous phase and amplitude of the lowest frequency mode. Further, when the base waveform, i.e., 𝑦𝑖 in 𝑣(𝑑) = Í

π‘Žπ‘–(𝑑)𝑦𝑖(πœƒπ‘–(𝑑)), of each mode is unknown, GPR can be applied to estimate 𝑦𝑖as presented in Section 5.1. These algorithms were extended in [102, Sec. 10], showing the method can be constructed to be robust to noise, vanishing amplitudes, and modes with crossing frequencies. We find that patterns within each mode can be estimated with GPR from the sum of modes, even when these patterns are visually indistinguishable in the composite signal.

Learning kernels from patterns

We will discuss the Kernel Flow (KF) algorithm [154] next. At a high level, the algorithm is interpretable as learning kernels from patterns with a method inspired by cross-validation. The algorithm is described in Section 6 and operates under the principle that a kernel is desirable when it can make low error predictions with small samples of the whole data set. This error is quantified by selecting𝑁random training points and computing the kernel interpolation with a further random 𝑁/2 of these points. We compute the error of the of the interpolation on the other𝑁/2 points. Assuming these interpolations are written as𝑒†, 𝑓 and𝑒†,𝑐 respectively, the

error is quantified as the KF loss function, 𝜌:=

k𝑒†, 𝑓 βˆ’π‘’β€ ,𝑐k2

π‘˜πœ½

k𝑒†, 𝑓k2

π‘˜πœ½

, (1.6.1)

and the KF algorithm selects a kernel with optimization. Since this loss function is dependent mainly on the training data, this method selects a kernel based on patterns in that data. We present examples throughout Section6 including on the MNIST dataset. We find that this technique is able to learn kernels which can predict classes accurately only observing one point per class. Additionally, we observe evidence of unsupervised learning since archetypes within each class appear to be learned. A further example of pattern learning applying the KF algorithm can be found in [57], where it has been applied to data in chaotic dynamical systems to learn a model kernel.

An application of the KF algorithm to Artificial Neural Networks (ANNs) will further be demonstrated in Section7to improve key performance statistics in MNIST and CIFAR image classification problems. These ANNs are widely used to address this problem and are defined as the mapping

π‘“πœƒ(π‘₯)= 𝑓(

𝑛) πœƒπ‘› β—¦ 𝑓(

π‘›βˆ’1)

πœƒπ‘›βˆ’1 β—¦ Β· Β· Β· β—¦ 𝑓(1)

πœƒ1

(π‘₯). (1.6.2)

This map has input π‘₯ and 𝑛 layers 𝑓(𝑖)

πœƒπ‘–

(𝑧) = πœ™(π‘Šπ‘–π‘§ + 𝑏𝑖) parameterized17 by the weights and biases πœƒπ‘– := (π‘Šπ‘–, 𝑏𝑖), πœƒ := {πœƒ1, . . . , πœƒπ‘›}. The output of π‘“πœƒ is in R𝑐, where 𝑐 represents the number of classes in the dataset. This is converted into a classifier by selecting the component with largest value. The parameters πœƒ best modeling the patterns of the data are learned by optimizing the error of the classifier, usually with cross-entropy loss18, on the training data.

Kernels can be incorporated into ANNs by allowing π‘“πœƒ to map into a higher di- mensional space and applying kernel interpolation on the result, which leads to an improvement of error rates [103, Sec. 10]. Further improvements can be made by reverting to the standard π‘“πœƒ largest component classifier and constructing a kernel dependent on intermediate-layer output

β„Ž(𝑖)(π‘₯) := 𝑓(

𝑖) πœƒπ‘– β—¦ 𝑓(

π‘–βˆ’1)

πœƒπ‘–βˆ’1 β—¦ Β· Β· Β· β—¦ 𝑓(1)

πœƒ1

(π‘₯), (1.6.3)

for𝑖 = 1, . . . , 𝑛. The KF loss corresponding to this kernel is then used in tandem with the standard cross-entropy loss, which leads to improvements in testing error,

17Weights,π‘Šπ‘–, are linear operators and biases, 𝑏𝑖 are vectors. The function πœ™ is an arbitrary function, typically taken as the ReLU,πœ™(𝑧)=max(0, 𝑧).

18This loss is defined in equation (7.0.4).

generalization gap, and robustness to distributional shift. Details on our numerical findings can be found in Section 7.1. Note that kernel interpolation itself is not directly used as a classifier; the KF loss is used only as a regularization of the loss function used in the optimization of the ANN parameters. This application of kernels is a novel method for training and clustering intermediate-layer outputs in conjunction with the final output π‘“πœƒ. We present numerical experiments that show the KF loss function aids in the learning of parameters which most accurately classify patterns in images.

C h a p t e r 2

DENOISING

2.1 Introduction to the denoising problem

[37–39] addressed the problem of recovering of a smooth signal from noisy obser- vations by soft-thresholding empirical wavelet coefficients [37]. More recently, [33]

considered the recovery ofπ‘₯ ∈ 𝑋based on the observation of𝑇 π‘₯+𝜁, whereπœπ‘–is IID.

N (0, 𝜎2) and𝑇 is a compact linear operator between Hilbert spaces 𝑋 andπ‘Œ, with the prior thatπ‘₯lies in an ellipsoid defined by the eigenvectors ofπ‘‡βˆ—π‘‡. [33] showed that thresholding the coefficients of the corrupted signal𝑇 π‘₯+𝜁in the basis formed by the singular value decomposition (SVD) of𝑇 (which can be computed in (𝑁3) complexity) approached the minimax recovery to a fixed multiplicative constant.

The contributions presented in this section [153] address denoisings in the following formulation. Suppose

L :H0𝑠(Ξ©) β†’ Hβˆ’π‘ (Ξ©) (2.1.1)

is a symmetric positive local1 linear bijection with 𝑠 ∈ Nβˆ— and regular bounded Ξ© βŠ‚ R𝑑 (𝑑 ∈N). Letk Β· k be the energy-norm defined by

k𝑒k2:=

∫

Ξ©

𝑒L𝑒 , (2.1.2)

and write

𝑒, 𝑣

:=

∫

Ξ©

𝑒L𝑣 (2.1.3)

for the associated scalar product. Further, define

𝑉𝑀 :={𝑒 ∈ H0𝑠(Ξ©) :L𝑒 ∈ 𝐿2(Ξ©)and kL𝑒k𝐿2(Ξ©) ≀ 𝑀}. (2.1.4) Further, let

𝜁 ∼ N (0, 𝜎2𝛿(π‘₯βˆ’π‘¦)) (2.1.5) be white noise in domain Ξ© with variance 𝜎2. The following is the continuous version of the denoising problem studied in this section.

Problem 5. Let 𝑒 be an unknown element of 𝑉𝑀, given the noisy observation πœ‚ =𝑒+𝜁, find an approximation of𝑒that is as accurate as possible in the energy norm k Β· k.

1Symmetric positive local defined as∫

Ω𝑒L𝑣=∫

Ω𝑣L𝑒,∫

Ω𝑒L𝑒 >0 for𝑒 β‰ 0, and∫

Ω𝑒L𝑣=0 for𝑒, π‘£βˆˆ H𝑠

0(Ξ©)with disjoint supports, respectively.

This problem will be illustrated in the𝑠=1 case with the linear differential operator L = βˆ’div π‘Ž(π‘₯)βˆ‡ Β·

where the conductivity π‘Ž is a uniformly elliptic symmetric 𝑑 Γ— 𝑑 matrix with entries in 𝐿∞(Ξ©). This example is of practical importance in groundwater flow modeling (where π‘Ž is the porosity of the medium) and in electrostatics (whereπ‘Žis the dielectric constant), and in both applicationsπ‘Žmay be rough (non-smooth) [7,19].

Example 2.1.1. Assuming

L =βˆ’div π‘Ž(π‘₯)βˆ‡ Β·

:H01(Ξ©) β†’ Hβˆ’1(Ξ©), (2.1.6) Prob.5then corresponds to the problem of recovering the solution of the PDE





ο£²



ο£³

βˆ’div π‘Ž(π‘₯)βˆ‡π‘’(π‘₯)

= 𝑓(π‘₯) π‘₯ ∈Ω;

𝑒 =0 on πœ•Ξ©,

(2.1.7)

from its noisy observationπœ‚=𝑒+𝜁 with knowledgek𝑓k𝐿2(Ξ©) < 𝑀.

This problem is addressed by expressing πœ‚ in the gamblet transform adapted to operator L and applying a truncation to the series. This method is theoretically proved to yield a recovery within a constant of the minimax optimal recovery [153]. This method is numerically compared to thresholding the gamblet transform coefficients as well asregularization, the minimization of

k𝑣(πœ‚) βˆ’πœ‚k2

𝐿2(Ξ©)+𝛼k𝑣(πœ‚) k2. (2.1.8) 2.2 Summary of operator-adapted wavelets

We proceed by reviewingoperator-adapted waveletsas in [153, Sec. 2], also named gambletsin reference to their game theoretic interpretation, and their main properties [99,101,104,119]. They are constructed with a hierarchy of measurement functions and an operator. Theorem2.2.4shows these gamblets are simultaneously associated with Gaussian conditioning, optimal recovery, and game theory. By selecting these measurement functions to bepre-Haar wavelets, the gamblets are localized both in space and in the eigenspace of the operator.

Hierarchy of measurement functions

Letπ‘ž ∈ Nβˆ— (used to represent a number of scales). Let(I(π‘˜))1β‰€π‘˜β‰€π‘ž be a hierarchy of labels defined as follows. I(π‘ž) is a set of π‘ž-tuples consisting of elements 𝑖 = (𝑖1, . . . , π‘–π‘ž). For 1 ≀ π‘˜ ≀ π‘ž and𝑖 ∈ I(π‘ž),𝑖(π‘˜) := (𝑖1, . . . , π‘–π‘˜) andI(π‘˜) is the set of

π‘˜-tuplesI(π‘˜) = {𝑖(π‘˜)|𝑖 ∈ I(π‘ž)}. For 1 ≀ π‘Ÿ ≀ π‘˜ ≀ π‘ž and 𝑗 = (𝑗1, . . . , π‘—π‘˜) ∈ I(π‘˜), we write 𝑗(π‘Ÿ) = (𝑗1, . . . , π‘—π‘Ÿ). We say that 𝑀 is aI(π‘˜) Γ— I(𝑙) matrix if its rows and columns are indexed by elements ofI(π‘˜) andI(𝑙), respectively.

Let {πœ™(π‘˜)

𝑖 |π‘˜ ∈ {1, . . . , π‘ž}, 𝑖 ∈ I(π‘˜)} be a nested hierarchy of elements of Hβˆ’π‘ (Ξ©) such that(πœ™(π‘ž)

𝑖 )π‘–βˆˆI(π‘ž) are linearly independent and πœ™(π‘˜)

𝑖 = Γ•

π‘—βˆˆI(π‘˜+1)

πœ‹(π‘˜ , π‘˜+1)

𝑖, 𝑗 πœ™(π‘˜+1)

𝑗 (2.2.1)

for𝑖 ∈ I(π‘˜),π‘˜ ∈ {1, . . . , π‘žβˆ’1}, whereπœ‹(π‘˜ , π‘˜+1) is anI(π‘˜) Γ— I(π‘˜+1) matrix and

πœ‹(π‘˜ , π‘˜+1)πœ‹(π‘˜+1, π‘˜) = 𝐼(π‘˜). (2.2.2)

In (2.2.2), πœ‹(π‘˜+1, π‘˜) is the transpose of πœ‹(π‘˜ , π‘˜+1) and 𝐼(π‘˜) is the I(π‘˜) Γ— I(π‘˜) identity matrix.

Hierarchy of operator-adapted pre-wavelets Let(πœ“(

π‘˜)

𝑖 )π‘–βˆˆI(π‘˜)be the hierarchy of optimal recovery splines associated with(πœ™(

π‘˜) 𝑖 )π‘–βˆˆI(π‘˜), i.e., forπ‘˜ ∈ {1, . . . , π‘ž}and𝑖 ∈ I(π‘˜),

πœ“(

π‘˜)

𝑖 = Γ•

π‘—βˆˆI(π‘˜)

𝐴(

π‘˜) 𝑖, 𝑗 Lβˆ’1πœ™(

π‘˜)

𝑗 , (2.2.3)

where

𝐴(π‘˜) :=(Θ(π‘˜))βˆ’1 (2.2.4)

andΘ(π‘˜) is theI(π‘˜)Γ— I(π‘˜) symmetric positive definite Gramian matrix with entries (writing [πœ™, 𝑣] for the duality pairing betweenπœ™ ∈ Hβˆ’π‘ (Ξ©) and𝑣 ∈ H𝑠

0(Ξ©)) Ξ˜π‘–, 𝑗(π‘˜) =[πœ™(

π‘˜)

𝑖 ,Lβˆ’1πœ™(

π‘˜)

𝑗 ]. (2.2.5)

Note that 𝐴(π‘˜) is the stiffness matrix of the elements(πœ“(

π‘˜)

𝑖 )π‘–βˆˆI(π‘˜) in the sense that 𝐴(

π‘˜) 𝑖, 𝑗 =

πœ“(

π‘˜) 𝑖 , πœ“(

π‘˜) 𝑗

. (2.2.6)

Writing Ξ¦(π‘˜) :=span{πœ™(π‘˜)

𝑖 | 𝑖 ∈ I(π‘˜)} and𝔙(π‘˜) := span{πœ“(π‘˜)

𝑖 | 𝑖 ∈ I(π‘˜)}, Ξ¦(π‘˜) βŠ‚ Ξ¦(π‘˜+1) and Ξ¨(π‘˜) = Lβˆ’1Ξ¦(π‘˜) imply Ξ¨(π‘˜) βŠ‚ Ξ¨(π‘˜+1). We further write [πœ™(π‘˜), 𝑒] =

[πœ™(π‘˜)

𝑖 , 𝑒]

π‘–βˆˆI(π‘˜) ∈RI

(π‘˜). The(πœ™(π‘˜)

𝑖 )π‘–βˆˆI(π‘˜) and (πœ“(π‘˜)

𝑖 )π‘–βˆˆI(π‘˜) form a bi-orthogonal system in the sense that [πœ™(π‘˜)

𝑖 , πœ“(π‘˜)

𝑗 ] =𝛿𝑖, 𝑗 for𝑖, 𝑗 ∈ I(π‘˜) (2.2.7) and the

Β·,Β·

-orthogonal projection of𝑒 ∈ H𝑠

0(Ξ©)onΞ¨(π‘˜) is 𝑒(π‘˜) := Γ•

π‘–βˆˆI(π‘˜)

[πœ™(

π‘˜) 𝑖 , 𝑒]πœ“(

π‘˜)

𝑖 . (2.2.8)

Multiple interpretations of operator adapted pre-wavelets Using operator-adapted pre-wavelets,πœ“(π‘˜)

𝑖 , we summarize the connections between optimal recovery, game theory, and Gaussian conditioning. First, we define Gaussian fields, a generalization of Gaussian processes.

Definition 2.2.1. The canonical Gaussian field πœ‰ associated with operator L : H𝑠

0(Ξ©) β†’ Hβˆ’π‘ (Ξ©) is defined such that πœ™ ↦→ [πœ™, πœ‰] is the linear isometry from Hβˆ’π‘ (Ξ©) to a Gaussian space characterized by





ο£²



ο£³

[πœ™, πœ‰] ∼ N (0,kπœ™kβˆ—2) Cov [πœ™, πœ‰],[πœ‘, πœ‰]

= hπœ™, πœ‘iβˆ—,

(2.2.9)

where kπœ™kβˆ— =supπ‘’βˆˆH𝑠

0(Ξ©)

∫

Ξ©πœ™π‘’

k𝑒k is the dual norm ofk Β· k.

Remark 2.2.2. When𝑠 > 𝑑/2, the evaluation functional𝛿π‘₯(𝑓) = 𝑓(π‘₯) is continu- ous. Hence,πœ‰|𝛿π‘₯,π‘₯∈Ωis naturally isomorphic to a Gaussian process with covariance functionπ‘˜(π‘₯ , π‘₯0) =h𝛿π‘₯, 𝛿π‘₯0iβˆ—.

Several notable properties of these pre-wavelets are summarized in the following result. Recall we write[πœ™(π‘˜), 𝑒] = [πœ™(π‘˜)

𝑖 , 𝑒]

π‘–βˆˆI(π‘˜) ∈RI

(π‘˜). Theorem 2.2.3. Consider pre-wavelets πœ“(

π‘˜)

𝑖 adapted to operator L constructed with measurement functions πœ™(π‘˜)

𝑖 . Further, suppose that for 𝑒 ∈ H𝑠

0(Ξ©) we define 𝑣†(𝑒) =𝑒(π‘˜) =Í

π‘–βˆˆI(π‘˜)[πœ™(π‘˜)

𝑖 , 𝑒]πœ“(π‘˜)

𝑖 . 1. For fixed𝑒 ∈ H𝑠

0(Ξ©),𝑣†(𝑒) is the minimizer of





ο£²



ο£³

Minimizekπœ“k Subject toπœ“ ∈ H𝑠

0(Ξ©)and [πœ™(π‘˜), πœ“] =[πœ™(π‘˜), 𝑒].

(2.2.10)

2. For fixed𝑒 ∈ H𝑠

0(Ξ©),𝑣†(𝑒) is the minimizer of





ο£²



ο£³

Minimizekπ‘’βˆ’πœ“k Subject toπœ“ ∈span{πœ“(π‘˜)

𝑖 :𝑖 ∈ I(π‘˜)}.

(2.2.11)

3. For canonical Gaussian fieldπœ‰ ∼ N (0,Lβˆ’1), 𝑣†(𝑒) =E

πœ‰

[πœ™(π‘˜), πœ‰] = [πœ™(π‘˜), 𝑒]

. (2.2.12)

4. It is true that2

π‘£β€ βˆˆargminπ‘£βˆˆπΏ(Ξ¦,H𝑠

0(Ξ©)) sup

π‘’βˆˆH𝑠

0(Ξ©)

kπ‘’βˆ’π‘£(𝑒) k

k𝑒k (2.2.13)

Proof. (1) is a result of [101, Cor. 3.4] (2) is equivalent to [101, Thm. 12.2] (3) and

(4) are results in [101, Sec. 8.5].

This result shows that the operator-adapted pre-wavelets transform defined by 𝑣†(𝑒) = 𝑒(π‘˜) is an optimal recovery in the sense of Theorem 2.2.3.1-2. Simul- taneously, 𝑣†(𝑒) are conditional expectations of the canonical Gaussian field with respect to the measurements [πœ™(π‘˜),Β·] as in Theorem 2.2.3.3. Another interpreta- tion of the transform is game theoretic as expressed in Theorem2.2.13.4. Equation (2.2.13) represents the adversarial two player game where player I selects𝑒 ∈ H𝑠

0(Ξ©) and player II approximates𝑒 with𝑣(𝑒) with measurements [πœ™(π‘˜), 𝑒]. Player I and II aim to maximize and minimize the recovery error of𝑣(𝑒). This game theoretic interpretation inspires the name gamblets, referring to operator-adapted wavelets.

Note that the pre-waveletsπœ“(π‘˜)

𝑖 lie on only one level of the hierarchy. The following addresses the construction of a wavelet decomposition ofH𝑠

0(Ξ©)on all hierarchical levels.

Operator-adapted wavelets

Let(J(π‘˜))2β‰€π‘˜β‰€π‘žbe a hierarchy of labels such that, writing|J(π‘˜)|for the cardinal of J(π‘˜),

|J(π‘˜)|=|I(π‘˜)| βˆ’ |I(π‘˜βˆ’1)|. (2.2.14) Forπ‘˜ ∈ {2, . . . , π‘ž}, letπ‘Š(π‘˜) be a J(π‘˜) Γ— I(π‘˜) matrix such that3

Ker(πœ‹(π‘˜βˆ’1, π‘˜)) =Im(π‘Š(π‘˜),𝑇). (2.2.15)

Forπ‘˜ ∈ {2, . . . , π‘ž}and𝑖 ∈ J(π‘˜) define πœ’(π‘˜)

𝑖 := Γ•

π‘—βˆˆI(π‘˜)

π‘Š(π‘˜)

𝑖, 𝑗 πœ“(π‘˜)

𝑗 , (2.2.16)

and write π”š(π‘˜) := span{πœ’(π‘˜)

𝑖 | 𝑖 ∈ J(π‘˜)}. Then π”š(π‘˜) is the

Β·,Β·

-orthogonal complement of𝔙(π‘˜βˆ’1) in𝔙(π‘˜), i.e. 𝔙(π‘˜) = 𝔙(π‘˜βˆ’1) βŠ•π”š(π‘˜),and

𝔙(π‘ž) = 𝔙(1) βŠ•π”š(2) βŠ• Β· Β· Β· βŠ•π”š(π‘ž). (2.2.17)

2𝐿(Ξ¦,H𝑠

0(Ξ©)) is defined as the set ofH𝑠

0(Ξ©) β†’ H𝑠

0(Ξ©) functions that are of form 𝑣(𝑒) = Ξ¨ [πœ™(π‘˜), 𝑒])

with measureableΨ:RI

(π‘˜) β†’ H0𝑠(Ξ©).

3We write𝑀(π‘˜),𝑇 and𝑀(π‘˜),βˆ’1for the transpose and inverse of a matrix𝑀(π‘˜).

Forπ‘˜ ∈ {2, . . . , π‘ž}write

𝐡(π‘˜) :=π‘Š(π‘˜)𝐴(π‘˜)π‘Š(π‘˜),𝑇 . (2.2.18) Note that𝐡(π‘˜) is the stiffness matrix of the elements(πœ’(π‘˜)

𝑗 )π‘—βˆˆJ(π‘˜), i.e., 𝐡(π‘˜)

𝑖, 𝑗 = πœ’(π‘˜)

𝑖 , πœ’(π‘˜)

𝑗

. (2.2.19)

Further, forπ‘˜ ∈ {2, . . . , π‘ž}, define

𝑁(π‘˜) := 𝐴(π‘˜)π‘Š(π‘˜),𝑇𝐡(π‘˜),βˆ’1 (2.2.20)

and, for𝑖 ∈ J(π‘˜),

πœ™(π‘˜), πœ’

𝑖 := Γ•

π‘—βˆˆI(π‘˜)

𝑁(π‘˜),𝑇

𝑖, 𝑗 πœ™(π‘˜)

𝑗 . (2.2.21)

Then defining𝑒(π‘˜) as in (2.2.8), it holds true that forπ‘˜ ∈ {2, . . . , π‘ž},𝑒(π‘˜)βˆ’π‘’(π‘˜βˆ’1) is the

Β·,Β·

-orthogonal projection of𝑒 onπ”š(π‘˜) and 𝑒(π‘˜) βˆ’π‘’(π‘˜βˆ’1) = Γ•

π‘–βˆˆJ(π‘˜)

[πœ™(

π‘˜), πœ’ 𝑖 , 𝑒]πœ’(

π‘˜)

𝑖 . (2.2.22)

To simplify notations, write J(1) :=I(1), 𝐡(1) := 𝐴(1), 𝑁(1) := 𝐼(1), πœ™(1)

, πœ’

𝑖 := πœ™(1)

𝑖

for𝑖 ∈ J(1),J := J(1)βˆͺ Β· Β· Β· βˆͺ J(π‘ž), πœ’π‘– := πœ’(

π‘˜)

𝑖 andπœ™

πœ’ 𝑖 :=πœ™(

π‘˜), πœ’

𝑖 for𝑖 ∈ J(π‘˜) and4 1≀ π‘˜ ≀ π‘ž. Then theπœ™

πœ’

𝑖 and πœ’π‘– form a bi-orthogonal system, i.e., [πœ™

πœ’

𝑖 , πœ’π‘—] =𝛿𝑖, 𝑗 for𝑖, 𝑗 ∈ J (2.2.23) and

𝑒(π‘ž) =Γ•

π‘–βˆˆJ

[πœ™

πœ’

𝑖 , 𝑒]πœ’π‘–. (2.2.24)

Simplifying notations further, we will write [πœ™πœ’, 𝑒] for the J vector with entries [πœ™

πœ’

𝑖 , 𝑒]and πœ’for theJ vector with entries πœ’π‘–so that (2.2.24) can be written 𝑒(π‘ž) = [πœ™πœ’, 𝑒] Β·πœ’ . (2.2.25) Further, define the J by J block-diagonal matrix 𝐡 defined as 𝐡𝑖, 𝑗 = 𝐡(π‘˜)

𝑖, 𝑗 if 𝑖, 𝑗 ∈ J(π‘˜) and 𝐡𝑖, 𝑗 = 0 otherwise. Note that it holds that 𝐡𝑖, 𝑗 =

πœ’π‘–, πœ’π‘—

. When π‘ž =∞andβˆͺ∞

π‘˜=1Ξ¦(π‘˜) is dense inHβˆ’π‘ (Ξ©), then, writingπ”š(1) := 𝔙(1), H𝑠

0(Ξ©) =βŠ•βˆž

π‘˜=1π”š(π‘˜), (2.2.26)

4The dependence onπ‘˜is left implicit to simplify notation, for𝑖 ∈ Jthere exists a uniqueπ‘˜such thatπ‘–βˆˆ J(π‘˜).

𝑒(π‘ž) = 𝑒, and (2.2.24) is the corresponding multi-resolution decomposition of 𝑒. When π‘ž < ∞, 𝑒(π‘ž) is the projection of 𝑒 on βŠ•π‘ž

π‘˜=1π”š(π‘˜) and (2.2.25) is the corresponding multi-resolution decomposition. Note that the optimal recovery, game theory, and Gaussian conditioning results in Theorem 2.2.3 also holds for wavelets.

Theorem 2.2.4. Consider pre-wavelets πœ’π‘– adapted to operatorL constructed with measurement functionsπœ™πœ’. Further, suppose that for𝑒 ∈ H𝑠

0(Ξ©), we define𝑣†(𝑒)= 𝑒(π‘ž) = [πœ™πœ’, 𝑒] Β·πœ’.

1. For fixed𝑒 ∈ H𝑠

0(Ξ©),𝑣†(𝑒) is the minimizer of





ο£²



ο£³

Minimizekπœ“k Subject toπœ“ ∈ H𝑠

0(Ξ©) and[πœ™πœ’, πœ“] =[πœ™πœ’, 𝑒].

(2.2.27)

2. For fixed𝑒 ∈ H𝑠

0(Ξ©),𝑣†(𝑒) is the minimizer of





ο£²



ο£³

Minimizekπ‘’βˆ’πœ“k

Subject toπœ“ ∈span{πœ’π‘– :𝑖 ∈ J }.

(2.2.28)

3. For canonical Gaussian fieldπœ‰ ∼ N (0,Lβˆ’1), 𝑣†(𝑒) =E

πœ‰

[πœ™πœ’, πœ‰] = [πœ™πœ’, 𝑒]

. (2.2.29)

4. It is true that5

𝑣† ∈argminπ‘£βˆˆπΏ(Ξ¦,H0𝑠(Ξ©)) sup

π‘’βˆˆH𝑠

0(Ξ©)

kπ‘’βˆ’π‘£(𝑒) k

k𝑒k . (2.2.30)

Pre-Haar wavelet measurement functions

The gamblets used in the subsequent developments will use pre-Haar wavelets (as defined below) as measurement functionsπœ™(

π‘˜)

𝑖 and our main near-optimal denoising estimates will be derived from their properties (summarized in Thm.2.2.5).

Let𝛿, β„Ž ∈ (0,1). Let(𝜏(

π‘˜)

𝑖 )π‘–βˆˆI(π‘˜)be uniformly Lipschitz convex sets forming a nested partition of Ξ©, i.e., such that Ξ© = βˆͺπ‘–βˆˆI(π‘˜)𝜏(π‘˜)

𝑖 , π‘˜ ∈ {1, . . . , π‘ž} is a disjoint union except for the boundaries, and 𝜏(π‘˜)

𝑖 = βˆͺπ‘—βˆˆI(π‘˜+1):𝑗(π‘˜)=π‘–πœ(π‘˜+1)

𝑗 , π‘˜ ∈ {1, . . . , π‘žβˆ’1}.

5Here 𝐿(Ξ¦,H0𝑠(Ξ©)) is defined as the set ofH0𝑠(Ξ©) β†’ H0𝑠(Ξ©) functions that are of form 𝑣(𝑒)= Ξ¨ [πœ™πœ’, 𝑒])

with measureableΞ¨:RJ β†’ H0𝑠(Ξ©).

Assume that each𝜏(π‘˜)

𝑖 , contains a ball of radius𝛿 β„Žπ‘˜, and is contained in the ball of radiusπ›Ώβˆ’1β„Žπ‘˜. Writing |𝜏(

π‘˜)

𝑖 |for the volume of𝜏(

π‘˜) 𝑖 , take πœ™(

π‘˜)

𝑖 :=1

𝜏(π‘˜)

𝑖

|𝜏(

π‘˜)

𝑖 |βˆ’12 . (2.2.31)

The nesting relation (2.2.1) is then satisfied with πœ‹(π‘˜ , π‘˜+1)

𝑖, 𝑗 := |𝜏(π‘˜+1)

𝑗 |12|𝜏(π‘˜)

𝑖 |βˆ’12 for 𝑗(π‘˜) =𝑖andπœ‹(

π‘˜ , π‘˜+1)

𝑖, 𝑗 :=0 otherwise.

Forπ‘˜ ∈ {2, . . . , π‘ž}, let J(π‘˜) be a finite set of π‘˜-tuples of the form 𝑗 = (𝑗1, . . . , π‘—π‘˜) such that{𝑗(π‘˜βˆ’1) | 𝑗 ∈ J(π‘˜)} =I(π‘˜βˆ’1), and for𝑖 ∈ I(π‘˜βˆ’1), Card{𝑗 ∈ J(π‘˜) | 𝑗(π‘˜βˆ’1) = 𝑖} =Card{𝑠 ∈ I(π‘˜) |𝑠(π‘˜βˆ’1) =𝑖} βˆ’1. Note that the cardinalities of these sets satisfy (2.2.14).

Write 𝐽(π‘˜) for the J(π‘˜) Γ— J(π‘˜) identity matrix. For π‘˜ = 2, . . . , π‘ž, let π‘Š(π‘˜) be a J(π‘˜) Γ— I(π‘˜) matrix such that Im(π‘Š(π‘˜),𝑇) =Ker(πœ‹(π‘˜βˆ’1, π‘˜)),π‘Š(π‘˜)(π‘Š(π‘˜))𝑇 =𝐽(π‘˜) and π‘Š(π‘˜)

𝑖, 𝑗 =0 for𝑖(π‘˜βˆ’1) β‰  𝑗(π‘˜βˆ’1).

Theorem 2.2.5. With pre-Haar wavelet measurement functions, it holds true that 1. Forπ‘˜ ∈ {1, . . . , π‘ž} and𝑒 ∈ Lβˆ’1𝐿2(Ξ©),

kπ‘’βˆ’π‘’(π‘˜)k ≀𝐢 β„Žπ‘˜ 𝑠kL𝑒k𝐿2(Ξ©). (2.2.32) 2. Writing Cond(𝑀) for the condition number of a matrix 𝑀, we have for

π‘˜ ∈ {1,Β· Β· Β· , π‘ž}

πΆβˆ’1β„Žβˆ’2(π‘˜βˆ’1)𝑠𝐽(π‘˜) ≀ 𝐡(π‘˜) ≀ 𝐢 β„Žβˆ’2π‘˜ 𝑠𝐽(π‘˜) (2.2.33) andCond(𝐡(π‘˜)) ≀𝐢 β„Žβˆ’2𝑠.

3. For𝑖 ∈ I(π‘˜) andπ‘₯(

π‘˜)

𝑖 ∈𝜏(

π‘˜)

𝑖 ,

kπœ“π‘–kH𝑠(Ξ©\𝐡(π‘₯(π‘˜)

𝑖 ,𝑛 β„Ž)) ≀ 𝐢 β„Žβˆ’π‘ π‘’βˆ’π‘›/𝐢. (2.2.34)

4. The waveletsπœ“(

π‘˜) 𝑖 , πœ’(

π‘˜)

𝑖 and stiffness matrices 𝐴(π‘˜), 𝐡(π‘˜) can be computed to precision πœ– (in k Β· k-energy norm for elements of H𝑠

0(Ξ©) and in Frobenius norm for matrices) inO(𝑁log3𝑑 π‘πœ–) complexity.

Furthermore the constant𝐢 depends only on𝛿,Ξ©, 𝑑 , 𝑠, kL k := sup

π‘’βˆˆH0𝑠(Ξ©)

kL𝑒kHβˆ’π‘ (Ξ©) k𝑒kH𝑠

0(Ξ©)

and kLβˆ’1k := sup

π‘’βˆˆH0𝑠(Ξ©)

k𝑒kH𝑠

0(Ξ©)

kL𝑒kHβˆ’π‘ (Ξ©) .

(2.2.35)

Proof. (1) and (2) follows from an application of Prop. 4.17 and Theorems 4.14 and 3.19 from [100]. (3) follows from Thm. 2.23 of [100]. 4 follows from the complexity analysis of Alg. 6 of [100]. See [101] for detailed proofs.

Remark 2.2.6. The wavelets πœ“(

π‘˜) 𝑖 , πœ’(

π‘˜)

𝑖 and stiffness matrices 𝐴(π‘˜), 𝐡(π‘˜) can also be computed in O(𝑁log2𝑁log2𝑑 π‘πœ–) complexity using the incomplete Cholesky factorization approach of [119].

Theorem 2.2.5.2-3 implies that the gamblets are localized both in the eigenspace of operator L and inΞ©space. Further, Theorem2.2.5.1 shows the accuracy of the recovery,𝑒(π‘˜), in L norm is bounded by𝐿2norm of L𝑒. This result is used in the proofs of the denoising result shown in the following section.

2.3 Denoising by truncating the gamblet transform Near minimax recovery

In this section, we will present the result that truncating the gamblet transform of πœ‚ =𝑒+𝜁 in a discrete variant of Problem5produces an approximation of𝑒 that is minimax optimal up to a multiplicative constant [153, Sec. 4], i.e., near minimax.

The discretized version ofH𝑠

0(Ξ©)is the finite dimensional space spanned by gamblet wavelets, using pre-Haar measurement functions defined in Sec. 2.2, taken to the π‘ž-th level6. In addition, the discrete noise used in this problem, 𝜁 ∈ Ξ¨(π‘ž), is the projection of the noise (2.1.5) ontoΞ¨(π‘ž) (due to (2.2.8)).

Problem 6. Let𝑒 be an unknown element ofΞ¨(π‘ž) βŠ‚ H𝑠

0(Ξ©)for π‘ž <∞. Let𝜁 be a centered Gaussian vector inΞ¨(π‘ž) such that

E [πœ™(

π‘ž) 𝑖 , 𝜁] [πœ™(

π‘ž) 𝑗 , 𝜁]

=𝜎2𝛿𝑖, 𝑗. (2.3.1)

Given the noisy observation πœ‚ = 𝑒+𝜁 and a prior bound 𝑀 on kL𝑒k𝐿2, find an approximation of𝑒 inΞ¨(π‘ž) that is as accurate as possible in the energy normk Β· k.

To justify this discrete approximation, recall that by Theorem 2.2.5, we have kπ‘’βˆ’π‘’(π‘ž)k ≀ 𝐢 β„Žπ‘ž 𝑠kL𝑒k𝐿2(Ξ©). Hence, with the prior bound onkL𝑒k𝐿2, this approx- imation is arbitrarily accurate with π‘ž large enough. Let πœ‚ be as in Problem 6and let gamblets be defined as in Section2.2with pre-Haar measurement functions. For 𝑙 ∈ {1, . . . , π‘ž}, let

πœ‚(𝑙) :=

𝑙

Γ•

π‘˜=1

[πœ™(π‘˜), πœ’, πœ‚] Β·πœ’(π‘˜) (2.3.2)

6Note there are no mathematical constraints to the number of levels taken in the decomposition.

andπœ‚(0) =0 ∈Ψ(π‘ž). When𝑙 < π‘ž,πœ‚(𝑙) is a truncation of the full gamblet transform πœ‚=πœ‚(π‘ž) = [πœ™πœ’, πœ‚] Β· πœ’. Let𝑀 > 0 and write

𝑉(

π‘ž)

𝑀 ={𝑒 ∈Ψ(π‘ž) | kL𝑒k𝐿2(Ξ©) ≀ 𝑀}. (2.3.3) Assume that𝜎 >0 and write

𝑙† =argminπ‘™βˆˆ{0,...,π‘ž}𝛽𝑙, (2.3.4) for

𝛽𝑙 =







ο£²







ο£³

β„Ž2𝑠𝑀2 if𝑙 =0

𝜎2β„Žβˆ’(2𝑠+𝑑)𝑙 +β„Ž2𝑠(𝑙+1)𝑀2 if 1 ≀ 𝑙 ≀ π‘žβˆ’1 β„Žβˆ’(2𝑠+𝑑)π‘žπœŽ2 if𝑙 =π‘ž .

(2.3.5)

The following theorem asserts thatπœ‚(𝑙

†) is a near minimax recovery of𝑒, by which we mean that the k Β· k2 recovery error is minimax optimal up to a multiplicative constant (depending only on kL k,kLβˆ’1k,Ξ©, 𝑑 , 𝛿 and whose value can be made explicit using the estimates of [101]). We will also refer to πœ‚(𝑙

†)

as the smooth recovery of𝑒 because, with probability close to 1, it is nearly as regular in energy norm as𝑒.

Theorem 2.3.1. Suppose𝑣†(πœ‚) = πœ‚(𝑙

†)

; then there exists a constant 𝐢 depending only onβ„Ž,𝑠,kL k, kLβˆ’1k,Ξ©,𝑑, and𝛿such that

sup

π‘’βˆˆπ‘‰(π‘ž)

𝑀

E

kπ‘’βˆ’π‘£β€ (πœ‚) k2

< 𝐢 inf

𝑣(πœ‚) sup

π‘’βˆˆπ‘‰(π‘ž)

𝑀

E

kπ‘’βˆ’π‘£(πœ‚) k2

, (2.3.6)

where the infimum is taken over all measurable functions 𝑣 : Ξ¨(π‘ž) β†’ Ξ¨(π‘ž). Fur- thermore, if𝑙†≠ 0, then with probability at least1βˆ’πœ€,

kπœ‚(𝑙

†)k ≀ k𝑒k +𝐢 r

log1 πœ€

𝜎

2𝑠+𝑑 4𝑠+𝑑𝑀

2𝑠+2𝑑

4𝑠+𝑑 . (2.3.7)

Proof. See [153, Sec. 7].

Note that 𝑙† = π‘ž occurs (approximately) when π‘ž is such that β„Žπ‘ž > (𝜎

𝑀)4𝑠2+𝑑, i.e., when

𝜎 𝑀

< β„Žπ‘ž

4𝑠+𝑑

2 , (2.3.8)

and in this case πœ‚(π‘ž) is a near minimax optimal recovery of 𝑒(π‘ž). On the other extreme𝑙†=0 occurs (approximately) when(𝜎

𝑀)4𝑠2+𝑑 > β„Ž, i.e., when 𝜎

𝑀

> β„Ž

4𝑠+𝑑

2 , (2.3.9)

and in this case, the zero signal is a near optimal recovery. The signal-to-noise ratio determines which hierarchical level the truncation occurs. This represents the length-scale of the Gaussian conditioning. This can be seen in

πœ‚(π‘˜) =E πœ‰

[πœ™(π‘˜), πœ‰] = [πœ™(π‘˜), πœ‚]

, (2.3.10)

which is a conditioning with the level π‘˜ hierarchical pre-Haar wavelets πœ™(π‘˜). The trade-off between recovering an overly smooth or noisy signal is illustrated in Fig.2.2.

Numerical illustrations Example2.1.1with𝑑 =1

Figure 2.1: [153, Fig. 1], the plots of π‘Ž, 𝑓, 𝑒, πœ‚, the near minimax recovery 𝑣(πœ‚) =πœ‚(𝑙

†), its error from𝑒, and the derivatives of𝑒and𝑣(πœ‚).

Figure 2.2: A comparison ofπœ‚(𝑙). In this example𝑙†=4.

Consider Example 2.1.1 with 𝑑 = 1. Take Ξ© = [0,1] ∈ R, π‘ž = 10 and πœ™(

π‘˜)

𝑖 =

1[π‘–βˆ’1 2π‘˜, 𝑖

2π‘˜] for 1 ≀ 𝑖 ≀ 2π‘˜. Letπ‘Š(π‘˜) be the 2π‘˜βˆ’1 by 2π‘˜ matrix with non-zero entries

defined byπ‘Šπ‘–,2π‘–βˆ’1= √1

2 andπ‘Šπ‘–,2𝑖 =βˆ’βˆš1

2. LetL :=βˆ’div(π‘Žβˆ‡Β·)with π‘Ž(π‘₯) :=

Γ–10

π‘˜=1

(1+0.25 cos(2π‘˜π‘₯)). (2.3.11) In Fig.2.1we select 𝑓(π‘₯) at random uniformly over the unit𝐿2(Ξ©)-sphere of Ξ¦(π‘ž) and let𝜁 be white noise (as in (2.1.5)) with𝜎 =0.001 andπœ‚=𝑒+𝜁.

We next consider a case where 𝑓 is smooth, i.e., 𝑓(π‘₯) = sin(πœ‹π‘₯)

π‘₯ onπ‘₯ ∈ (0,1] and

𝑓(0) = πœ‹. Let𝜁 be white noise with standard deviation 𝜎 =0.01. See Fig.2.3for the corresponding numerical illustrations.

Both figures show that (1)𝑣(πœ‚)andβˆ‡π‘£(πœ‚)are accurate approximations of𝑒andβˆ‡π‘’ (2) the accuracy of these approximations increases with the regularity of 𝑓.

Figure 2.3: [153, Fig. 2], the plots ofπ‘Ž, smooth 𝑓,𝑒,πœ‚,𝑣(πœ‚) =πœ‚(𝑙

†)

, its error from 𝑒, and the derivatives of𝑒and𝑣(πœ‚).

Example2.1.1with𝑑 =2

Consider Example2.1.1with𝑑 =2. TakeΞ© = [0,1]2andπ‘ž =7. Use the pre-Haar wavelets defined as πœ™(

π‘˜)

𝑖, 𝑗 = 1[π‘–βˆ’1 2π‘˜, 𝑖

2π‘˜]Γ—[π‘—βˆ’1

2π‘˜,

𝑗

2π‘˜] for 1 ≀ 𝑖, 𝑗 ≀ 2π‘˜. Letπ‘Š(π‘˜) be defined be the 3(4π‘˜βˆ’1) by 4π‘˜ matrix defined as in construction 4.13 of [99].

In Fig.2.4we select 𝑓(π‘₯) at random uniformly over the unit𝐿2(Ξ©)-sphere of Ξ¦(π‘ž) and let𝜁 be white noise (as in (2.1.5)) with𝜎 =0.001 andπœ‚=𝑒+𝜁.

Figure 2.4: [153, Fig. 3], the plots ofπ‘Ž, 𝑓, 𝑒, πœ‚, 𝑣(πœ‚) =πœ‚(𝑙

†)

, its error from𝑒, and the gradient of𝑒and𝑣(πœ‚).

Figure 2.5: The plots of π‘Ž, smooth 𝑓, 𝑒, πœ‚, 𝑣(πœ‚) = πœ‚(𝑙

†)

, its error from𝑒, and the gradient of𝑒and𝑣(πœ‚)[153, Fig. 4].

LetL =βˆ’div(π‘Žβˆ‡Β·)with π‘Ž(π‘₯ , 𝑦) :=

7

Γ–

π‘˜=1

h 1+ 1

4cos(2π‘˜πœ‹(π‘₯+𝑦)

1+ 1

4cos(2π‘˜πœ‹(π‘₯βˆ’3𝑦) i .

(2.3.12)

Next consider a case where 𝑓 is smooth, i.e., 𝑓(π‘₯ , 𝑦) = cos(3π‘₯ + 𝑦) +sin(3𝑦) + sin(7π‘₯βˆ’5𝑦). Let 𝜁 be white noise with standard deviation𝜎 =0.01. See Fig.2.5 for the corresponding numerical illustrations. As with the 𝑑 = 1 plots, the 𝑑 = 2 plots show the accuracy of the recovery of𝑒 andβˆ‡π‘’ and the positive impact of the regularity of 𝑓 on that accuracy.

2.4 Comparisons

Hard- and soft-thresholding

Since hard- and soft-thresholding have been used in Donoho and Johnstone [36–38]

for the near minimax recovery of regular signals, we will compare the accuracy of (2.3.2) with that of hard- and soft-thresholding the Gamblet transform of the noisy signal [153, Sec. 5]. We callhard-thresholdingthe recovery of𝑒with

𝑣(πœ‚)=

π‘ž

Γ•

π‘˜=1

Γ•

π‘–βˆˆJ(π‘˜)

𝐻𝑑

(π‘˜)( [πœ™(

π‘˜), πœ’ 𝑖

, πœ‚])πœ’(π‘˜)

𝑖 (2.4.1)

and

𝐻𝛽(π‘₯) =





ο£²



ο£³

π‘₯ |π‘₯| > 𝛽 0 |π‘₯| ≀ 𝛽 .

(2.4.2) We callsoft-thresholdingthe recovery of𝑒with

𝑣(πœ‚) =

π‘ž

Γ•

π‘˜=1

Γ•

π‘–βˆˆJ(π‘˜)

𝑆𝑑

(π‘˜)

( [πœ™(π‘˜), πœ’

𝑖 , πœ‚])πœ’(π‘˜)

𝑖 (2.4.3)

and

𝑆𝛽(π‘₯) =





ο£²



ο£³

π‘₯βˆ’π›½sgn(π‘₯) |π‘₯| > 𝛽

0 |π‘₯| ≀ 𝛽 .

(2.4.4)

The parameters (𝑑1, . . . , π‘‘π‘ž) are adjusted to achieve minimal average errors. Since the mass matrix of πœ™πœ’ is comparable to identity (see [153, Thm. 10]) and the bi- orthogonality identities [πœ™

πœ’

𝑖, πœ’π‘—] =𝛿𝑖, 𝑗, [𝑓 , πœ’] is approximately uniformly sampled on the unit sphere ofRJ and the variance of[𝑓 , πœ’(

π‘˜)

𝑖 ]can be approximated by 1/|J |. Therefore [πœ™πœ’, 𝑒] = 𝐡(π‘˜),βˆ’1[𝑓 , πœ’(π‘˜)] and (2.2.33) imply that the standard deviation of [πœ™(π‘˜), πœ’, 𝑒] can be approximated by β„Žβˆ’2π‘˜ 𝑠/p

|J |. Therefore optimal choices for threshold on theπ‘˜-th hierarchical level follow the power law𝑑(π‘˜) =β„Žβˆ’2π‘˜ 𝑠𝑑0for some parameter𝑑0.

Regularization

We callregularizationthe recovery of𝑒with𝑣(πœ‚) defined as the minimizer of k𝑣(πœ‚) βˆ’πœ‚k2

𝐿2(Ξ©)+𝛼k𝑣(πœ‚) k2. (2.4.5) For practical implementation, we consider𝐴𝑖, 𝑗 =πœ“Λœπ‘–,πœ“Λœπ‘—

, the𝑁×𝑁stiffness matrix obtained by discretizingLwith finite elements Λœπœ“1, . . . ,πœ“Λœπ‘, and writeπœ‚ =Í𝑁

𝑖=1π‘¦π‘–πœ“Λœπ‘–

Dokumen terkait