Statistics authors titles recent submissions

(1)

On the Existence of Densities for Functional Data and their

Link to Statistical Privacy

Ardalan Mirshani

Matthew Reimherr

∗

_{Aleksandra Slavkovi´c}

Abstract

In statistical privacy (or statistical disclosure control) the goal is to minimize the potential for identification of individual records or sensitive characteristics while at the same time ensur-ing that the released information provides accurate and valid statistical inference. Differential Privacy, DP, has emerged as a mathematically rigorous definition of risk and more broadly as a framework for releasing privacy enhanced versions of a statistical summary. This work develops an extensive theory for achieving DP with functional data or function valued parameters more generally. Functional data analysis, FDA, as well as other branches of statistics and machine learning often deal with function valued parameters. Functional data and/or functional param-eters may contain unexpectedly large amounts of personally identifying information, and thus developing a privacy framework for these areas is critical in the era of big data. Our theoretical framework is based on densities over function spaces, which is of independent interest to FDA researchers, as densities have proven to be challenging to define and utilize for FDA models. Of particular interest to researchers working in statistical disclosure control, we demonstrate how even small amounts of over smoothing or regularizing can produce releases with substantially improved utility. We carry out extensive simulations to examine the utility of privacy enhanced releases and consider an application to Multiple Sclerosis and Diffusion Tensor Imaging.

KEYWORDS: Differential Privacy, Functional Data, Hilbert Space, Density Estimation, Re-producing Kernel Hilbert Space

1 Introduction

New studies, surveys, and technologies are resulting in ever richer and more informative data

sets. Data being collected as part of the “big data revolution” occurring in the sciences have

dramatically expanded the pace of scientific progress over the last several decades, but often

∗_{Address for correspondence: Matthew Reimherr, Department of Statistics, Pennsylvania State University, 411}

Thomas Building, University Park, PA 16802, USA. E-mail: [email protected]

(2)

contain a significant amount of personal or subject level information. These data and their

cor-responding analyses present substantial challenges for preserving subjects’ privacy as researchers

attempt to understand what information can be publicly released without impeding scientific

advancement and policy making (Lane et al., 2014).

One type of big data that has been heavily researched in the statistics community over the

last two decades isfunctional data, with the corresponding branch of statistics calledfunctional

data analysis, FDA. FDA is concerned with conducting statistical inference on samples of

func-tions, trajectories, surfaces, and other similar objects. Such tools have become increasingly

necessary as our data gathering technologies become more sophisticated. FDA methods have

proven very useful in a wide variety of fields including economics, finance, genetics, geoscience,

anthropology, and kinesiology, to name only a few (Ramsay and Silverman, 2002, 2005;

Fer-raty and Romain, 2011; Horv´ath and Kokoszka, 2012; Kokoszka and Reimherr, 2017). Indeed,

nearly any data rich area of science will eventually come across applications that are amenable

to FDA techniques. However, functional data are also a rich source of potentially personally

identifiable information (e.g., genomic data, including genomic sequences Erlich and Narayanan

(2014); Schadt et al. (2012), biomedical imaging Kulynych (2002), etc.).

To date, there has been very little work concerning FDA and statistical data privacy, that

is, statistical disclosure control (SDC) or statistical disclosure limitation (SDL), the branch of

statistics concerned with limiting identifying information in released data and summaries while

maintaining their utility for valid statistical inference. SDC has a rich history for both

method-ological developments and applications for “safe” release of altered (or masked) microdata and

tabular data, e.g., (Dalenius, 1977; Rubin, 1993; Willenborg and De Waal, 1996; Fienberg and

Slavkovi´c, 2010; Hundepool et al., 2012), for general review of the area and the related

method-ology. Hall et al. (2013) provide the most substantial contribution to privacy of FDA data to

date, where the authors are primarily interested in nonparametric smoothing, releasing

point-wise evaluations of functions with a particular focus on kernel density estimators. They work

within the framework of Differential Privacy, DP, that has emerged from theoretical computer

(3)

risk (Dwork, 2006; Dwork et al., 2006b). One of the major findings of Hall et al. (2013) is

the connection between DP and Reproducing Kernel Hilbert Spaces. This connection will be

explored here with even greater depth. Recently Ald`a and Rubinstein (2017) extended this

work by considering a Laplace (instead of a Gaussian) mechanism and focused on releasing an

approximation based on Bernstein polynomials, exploiting their close connection to point-wise

evaluation on a grid or mesh.

In this work, we move beyond Hall et al. (2013) by establishing DP for functional data more

broadly. We propose a mechanism for achieving DP for releasing any set of linear functionals of

a functional object, which include point-wise releases as a special case. We will show that this

mechanism also protects against any nonlinear functions of the functional object or even a full

function release. To establish the necessary probabilistic bounds for DP we utilize functional

densities. This represents a major step for FDA as densities for functional data are sometimes

thought not to exist (Delaigle and Hall, 2010). Lastly, we demonstrate these tools by estimating

the mean function via RKHS smoothing.

One of the major findings of this work is the interesting connection betweenregularization

and privacy. In particular, we show that by slightly over smoothing, one can achieve DP with

substantially less noise, thus better preserving the utility of the release. It turns out that a great

deal of personal information can reside in the “higher frequencies” of a functional parameter

estimate. To more fully illustrate this point, we demonstrate how a cross-validation for choosing

smoothing parameters can be dramatically improved when the cross-validation incorporates the

function to be released.Previous works concerning DP and regularization have primarily focused

on performing shrinkage regression in a DP manner (e.g., Kifer et al. (2012); Chaudhuri et al.

(2011)), but not using the smoothing to recover some utility as we propose here.

The remainder of the paper is organized as follows. In Section 2 we present the necessary

background material for DP and FDA. In Section 3 we present our primary results concerning

releasing a finite number of linear functionals followed by full function and nonlinear releases. We

also present our work related to densities for FDA, which is of independent interest. In Section

(4)

mechanism. In Section 5 we present simulations to highlight the role of different parameters,

while in Section 6 we present an application to Diffusion Tensor Imaging of Multiple Sclerosis

patients. In Section 7 we discuss our results and present concluding remarks. All mathematical

proofs and derivations can be found in the Appendix.

2 Background

2.1 Differential Privacy

LetDbe a countable (potentially infinite) population of records, and denote byDthe collection

of all n-dimensional subsets of D. Throughout we let D and D′ _{denote elements of} _D_.

Nota-tionally, we omit the dependence on nfor ease of exposition. Differntial Privacy, DP, was first

introduced by Dwork (2006); Dwork et al. (2006b) as a criteria for evaluating the efficacy of a

privacy mechanism. In this work we use a form called (α, β)₋DP (also commonly referred to

as (ǫ, δ)−DP), where α and β are privacy parameters with smaller values indicating stronger

privacy. DP is a property of the privacy mechanism applied to the data summary, in this case

fD := f(D), prior to release. For simplicity, we will denote the application of this mechanism

using a tilde; so ˜fD := ˜f(D) is the privacy enhanced version of fD. Probabilistically, we view

˜

fD as a random variable indexed by D (which is not treated as random). This criteria can be

define for any probability space.

Definition 1. Let f : _{D →} Ω, where Ω is some metric space. Let f˜D be random variables,

indexed by D, taking values in Ωand representing the privacy mechanism. The privacy

mecha-nism is said to achieve (α, β)₋DP if for any two datasets, D and D′_{, which differ in only one}

record, we have

P( ˜fD ∈A)≤P( ˜fD′ ∈A)eα+β,

for any measurable set A.

The most common setting is when Ω =R, i.e., the summary is real valued. In Section 3.1

(5)

we will consider the case when Ω =H, a real separable Hilbert space. At a high level, achieving

(α, β)−DP means that the object to be released changes relatively little if the sample on which it

is based is perturbed slightly. This change is related to what Dwork (2006) called “Sensitivity”.

Another nice feature is that if ˜fD achieves DP, then so does any measurable transformation

of it; see Dwork et al. (2006a,b) for the orginal results, Wasserman and Zhou (2010) for its

statistical framework, and Dwork and Roth (2014) for a more recent detailed review of relevant

DP results. For convenience, we restate the lemma and its proof with respect to our notation

in Appendix A.

2.2 Functional Data Analysis

Here we provide a few details concerning FDA that will be important later on. Throughout we

let H denote a real separable Hilbert space, i.e., a complete inner product space that admits a

countable basis. The most commonly encountered space in FDA is L2[0,1], though many other

spaces are becoming of increasing interest. For example, multivariate or panel functional data,

functional data over multidimensional domains (such as space-time), as well as spaces which

incorporate more structure on the functions, such as Sobolev spaces and Reproducing Kernel

Hilbert Spaces.

Throughout, we will let θ : D → H denote the particular summary of interest and for

notational ease, we define θD := θ(D). In Section 3.1 we consider the case where we want to

release a finite number of continuous linear functionals ofθD, where as in Section 3.2 we consider

releasing a privacy enhanced version of the entire function or some nonlinear transformation of

it (such as a norm).

The backbone of our privacy mechansim is the same as in Hall et al. (2013), and is used

extensively across the DP literature. In particular, we will add Gaussian noise to the summary

and show how the noise can be calibrated to achieve DP. We sayZ is a Gaussian process inHif

hZ, hiH is Gaussian inR, for anyh∈H. Every Gaussian process inHis uniquely parametrized

(6)

and nuclear)C :H→H(Laha and Rohatgi, 1979, Page 488). One then has that

hZ, h_i_H_{∼ N}(_hµ, h_i_H,_hC(h), h_i_H),

for anyh ∈H. We use the short hand notationN to denote the Gaussian distribution over R,

but include subscripts for any other space, e.g., _N_H for_H.

Let K be a symmetric, positive definite, bounded linear operator over H, that admits a

spectral decomposition given by

K =

∞

X

i=1

λivi⊗vi,

(1)

where _⊗ denotes the tensor product between two elements of _H, the result of which is an

operator, (x_⊗y)(h) := _hy, h_i_Hx. The λi and vi denote the eigenvalues and eigenfunctions,

respectively, of K. Without loss of generality, one can assume that thevi form an orthonormal

basis of H. The operator K can be used to define a subspace of Hthat is also a Hilbert space.

In particular,

K:=

(

h∈H:

∞

X

i=1

hh, vii2

λi

<∞

)

.

(2)

It is usually assumed that λi 6= 0, however, the case whereλi = 0 for someiis readily included

with the convention 0/0 = 0 in the above summation. In that case, it is understood that if

λi = 0 then hh, vii= 0 for all h ∈ K. For example, if only a finite number of λi were nonzero

then K would be finite dimensional. The space K is a Hilbert space when equipped with the

inner product _hx, y_i_K =P

λ−_i 1_hx, viihy, vii. When H =L2[0,1] and K is an integral operator

with kernel k(t, s), then K is also a Reproducing Kernel Hilbert Space, RKHS (Berlinet and

(7)

3 Privacy Enhanced Functional Data

In this section we present our main results. The mechanism we use for guaranteeing differential

privacy is to add a Gaussian noise before releasing θD. In other words, our releases will be

based on a private version ˜θD, where ˜θD = θD +δZ. However, it turns out that not just any

Gaussian noise,Z, can be used. In particular, the options for choosingZ depend heavily on the

summary θD. This is made explicit in the following definition.

Definition 2. We say that the summaryθD is compatible with a Gaussian noise,Z ∼ NH(0, C),

if θD ∈K for any D∈ D, where Kis defined in (2) with kernel K =C.

In other words, for a particular noise to be viable for the privacy mechanism, it must be the

case that the summary resides in the weighted subspace derived from the covariance operator

of the noise. Recall that every covariance operator is compact, symmetric, and positive definite

and thus admits the expansion (1). Our next definition comes from Hall et al. (2013), but we

state it here more generally.

Definition 3. The global sensitivity of a summary θD, with respect to a Gaussian noise Z ∼

NH(0, C) is given by

∆2= sup

D′_∼_Dk

θD−θD′k2_K,

where D′ ∼Dmeans the two sets differ at one record only. The space, K, is defined in (2)with

kernel K =C

The global sensitivity (GS) is a central quantity in the theory of DP. In particular, the

amount of noise, δZ, depends directly on the global sensitivity. It should be noted that there

are other notions of “sensitivity” in the DP literature, such as local sensitivity, but here our

focus is on the global sensitivity that typically leads to the worst case definition of risk under

DP; for a detailed review of DP theory and concepts, see Dwork and Roth (2014). If a summary

is not compatible with a noise, then it is possible to make the global sensitivity infinite, in which

case, no finite amount of noise would be able to preserve privacy in the sense of satisfying DP.

(8)

lie in K, then no level of noise can be added that guarantees scalar DP holds uniformly across

hθD, hifor any h.

Example 1. Let Z be Brownian motion over [0,1], which means that the kernel of C is given

by

c(t, s) = min_{t, s_} t, s_∈[0,1].

The kernel, c, admits the following spectral decomposition:

c(t, s) =

∞

X

j=1

2

(j₋1/2)2_π2 sin((j−1/2)πt) sin((j−1/2)πs),

λj =

1

(j₋1/2)2_π2, and vj(t) =

√

2 sin((j−1/2)πt).

Let Dconsist of the N iid realizations of the following process

Xi(t) =Yi ∞

X

j=1

√

2

(j₋1/2)3/4_π sin((j−1/2)πt), i= 1, . . . , N,

where Yi are iid standard normal. Then we have that

kXik2H =Yi2 ∞

X

j=1

1

(j₋1/2)3/2_π2 <∞ but kXik 2 K=Yi2

∞

X

j=1

(j₋1/2)2

(j₋1/2)3/2 =∞.

So the Xi are not in the RKHS,K defined using the kernelc(t, s). Now suppose that θD =X is

a single element ofD, and we aim to release a privacy enhance version offD :=hθD, vii. Using

classic results from univariate DP (e.g. Dwork et al., 2006a), a private version would be

hX, vjiH+δhZ, vjiH,

where

δ2_j = 2 log(2/β)

α2 ∆ 2

j and ∆2j = sup i,k

hXi−Xk, vji2_H hC(vj), vjiH

(9)

Here ∆j denotes the scalar global sensitivity of hX, vji with respect tohZ,vji. We can express

sup

i,k

hXi−Xk, vji2_H hC(vj), vjiH

= sup

i,k

(Yi−Yk)2

(j₋1/2)2

(j₋1/2)3/2 = (j−1/2) 0/5_sup

i,k

(Yi−Yk)2.

However, this quantity goes to infinity as j → ∞. Therefore, there is no level of noise, δ, we

can add to fD which would guarantee (α, β)-DP uniformly across the choice of h.

3.1 Releasing Finite Projections

We begin with the simpler task of releasing a finite vector of continuous linear functionals

of θD. By the Riesz Representation Theorem, every continuous linear functional, T(·), over

H is of the form T(_·) = _h·, h_i_H. Thus, equivalently, we aim to release a private version of

{hθD, h1iH, . . . ,hθD, hNiH} for some finite number N, and fixed{hi ∈H}Ni=1.

Theorem 1. Assume θD is compatible with Z∼ N(0, C), and define

˜

θD =θD +δZ with δ=

2 log(2/β)

α2 ∆ 2_.

Now define f :_{D →}RN as fD := {hθD, hii} and f˜D := {hθ˜D, hii}, for {hi ∈H}Ni=1. Then f˜D

achieves (α, β)-DP for any finite choice of N and any hi ∈H.

Theorem 1 can be viewed as an extension of Hall et al. (2013) who allow for point-wise

releases only. If the Hilbert spaceHis also taken to be an RKHS, then Theorem 1 implies

point-wise releases are protected as well, since in an RKHS point-point-wise evaluations are also continuous

linear functionals. However, this theorem allows the release of any functional. Additionally,

notice that while fD must lie in K, the hi can be any functions in H, which dramatically

increases the release options as compared to Hall et al. (2013) or Ald`a and Rubinstein (2017).

3.2 Full Function and Nonlinear Releases

While Section 3.1 covers a number of important cases, it does not cover all releases of potential

(10)

such as norms or derivatives. To guarantee privacy for these types of releases, we need to

establish (α, β)-DP for the entire function, not just finite projections. This means that in

Definition 1, the metric space is taken to be _Hwhich is infinite dimensional.

In previous works (e.g. Dwork et al., 2014; Hall et al., 2013) to establish the probability

inequalities like in Definition 1, bounds based on multivariate normal densities were used. This

presents a serious problem for FDA as there is nearly no work in the literature of FDA which

employs densities. This is due to the great difficulty in defining them for Hilbert space processes.

For example, Delaigle and Hall (2010) are one of the few to tackle this problem, but get around

it by defining a density only for finite “directions” of functional objects. Another example is

Bongiorno and Goia (2015) who define psuedo-densities by carefully controlling “small ball”

probabilities. Both works claim that for a functional object the density “generally does not

exist.” However, this turns out to be a technically incorrect claim, while still often being true

in spirit. The correct statement is that, in general, it is difficult to define a useful density for

functional data. In particular, to work with likelihood methods, a family of probability measures

should all have a density with respect to the same base measure, which, at present, does not

appear to be possible in general for functional data.

The concept of defining densities over function spaces has a very deep history. Given its

potential importance in FDA, we will attempt to reference some of the major works. These

results have been used extensively in areas such as mathematical finance and spatial statistics.

Recall that a density is a Radon-Nikodym derivative (RND) of one measure with respect to

another (herein called the base measure). For probability models, the base measure is typically

Lebesgue or counting measure for continuous and discrete random variables respectively. RNDs

were first introduced in the works of Radon (1913) and Nikodym (1930). The existence of RNDs

depends on the absolute continuity of measures and thus a great deal of work that followed

focused on determining when infinite dimensional measures were equivalent or orthogonal; for

example, see Cameron and Martin (1944, 1945, 1947); Kakutani (1948); Grenander (1950);

Baxter (1956); Feldman (1958); Girsanov (1960); Varberg (1961); Rozanov (1964); Shepp (1964);

(11)

what is the density of one with respect to the other? This also has a very rich body of literature

including Rozanov (1962); Rao and Varadarajan (1963); Shepp (1966) and Skorohod (1967). In

the early 1970s Black and Scholes published their landmark paper on option pricing (Black and

Scholes, 1973), which utilized much of this theory, while Ibragimov and Rozanov published a

book (Ibragimov and Rozanov, 1978) summarizing much of the discussed work, especially as it

related to stationary Gaussian processes. Thereafter, much of the work has been focused more

on specific areas of application as the bulk of the foundational theory had been established.

Our goal in using densities is relatively straightforward. We aim to define densities (with

respect to the same base measure) for the family of probability measures induced by_{f(D)+δZ:

D∈ Dn}, where Z is a mean zero Gaussian process in Hwith covariance operator C. It turns

out that this is in fact possible because we are adding the exact same type of noise to each

element. We give the following theorem, which can be viewed as a slight generalization and

reformulation of Theorem 6.1 from Rao and Varadarajan (1963).

Theorem 2. Assume that the summary θD is compatible with a noise Z. Denote by Q the

probability measure over H induced by δZ, and by {PD : D ∈ D} the family of probability

measures overHinduced byθD+δZ. Then every measure PD has a density overHwith respect

to Q, and the density is given by

dPD

dQ (h) = exp

− 1

2δ2 kfDk 2

K−2hh, fDiK

,

Q almost everywhere. This density is unique up to a set of Q measure zero.

Theorem 2 implies that, for any measurable setA⊂H we have

PD(A) =

Z

A

dPD

dQ (h)dQ(h).

To see why the density should be of the form given in Theorem 2, consider the density of

XJ = {hθD +δZ, vji}Jj=1, where vj is the jth eigenfunction of the covariance operator of Z.

(12)

{δ2_λ

j}Jj=1, which are the corresponding eigenvalues. While one typically defines the density of

XJ with respect to Lebesgue measure, we can instead define it with respect to another normal,

namely ZJ = {hδZ, vji}Jj=1, which has the same distribution as XJ except with zero mean.

To define this density, we need only take the ratio of their densities with respect to Lebesgue

measure. This gives

with respect to J, and, by Parceval’s identity, we would obtain the expression in Theorem 2.

While a rigorous proof can be found in the appendix, this provides some intuition as to why

the final density would take such a form.

Now that we have a well defined density we can establish differential privacy for entire

functions.

Theorem 3. LetfD :=θD, and assume that it is compatible with a noiseZ, thenf˜D :=fD+δZ

achieves (α, β)-DP over H, where δ is defined in Theorem 1.

We also have the following corollary.

Corollary 1. Let f be compatible with a noise Z, and let T be any measurable transformation

from H→ S, where _S is a metric space. Then T(fD+δZ) achieves(α, β)-DP over S, where δ

is defined in Theorem 1.

Together, Theorem 3 and Corollary 1 imply that the Guassian mechanism first introduced

by Hall et al. (2013) and expanded here, gives very broad privacy protection for functional data,

as nearly any transformation or manipulation of the privacy enhanced release is guaranteed to

maintain DP; this is known as a postporocessing property (e.g., see Dwork and Roth (2014)).

(13)

4 Global Sensitivity Bounds for RKHS Smoothing

In Sections 5 and 6 we illustrate how to produce private releases of mean function estimates

based on RKHS smoothing. Here we derive a corresponding bound on the global sensitivity to

be used generating the privacy enhanced estimates. There are of course a multitude of methods

for estimating smooth functions, however an RKHS approach is especially amenable to our

privacy mechanism. In this case we can define an RKHS smoother by using the kernel of the

noise, C, to define the space K, i.e.,K=C. The RKHS estimate of the meanµis then defined

as

ˆ

µ= argmin

M∈K

g(M) where g(M) = 1

N

X

i=1

kXi−Mk2_H+φkMk2_K,

where the φis calledpenalty parameter. Here, we can see the advantage of an RKHS approach

as it forces the estimate to lie in the space Kwhich means that the compatibility condition is

satisfied. A kernel different from the noise could be used, but one must be careful to make sure

that the the compatibility condition is met. If (λj, vj) are the eigenvalue/function pairs of the

kernel K and {Xi =P∞_j₌₁xijvj :i= 1, . . . , N}, with xij =hXi, vji, then the estimate can be

expressed as

ˆ

µ= 1

N

X

i=1

∞

X

j=1

λj

λj+φ

xijvj,

(3)

see Appendix B for details. We then have the following result.

Theorem 4. If the_Hnorm of all elements of the population is bounded by a constant0< τ <_∞

then the GS for RKHS estimation is bounded by

∆2_n_≤ 4τ 2

N2

∞

X

j=1

λj

(λj +φ)2

.

The resulting bound is practically very useful. One can rescale the data so that their _H

bound is, for example, 1, and then the remaining quantities are all tied to the noise/RKHS

(14)

achieve DP can be constructed.

5 Empirical Study

In this section we present simulations with H = L2[0,1] to explore the impact of different

quantities on the utility of privacy enhanced releases. This exploration will occur via the problem

of estimating the mean function from a random sample of functional observations using RKHS

smoothing, as discussed in Section 4. The quantities we will explore include the kernel function

K(t, s), used to define K, the privacy parameters, (α, β), and the penalty parameter, φ, that

(combined with K) controls the level of smoothing.

For the RKHS, K, we consider four popular kernels:

K1(t, s) = exp

Each kernel is from the Mat´ern family of covariances (Stein, 2012), though the first is also known

as the Gaussian or squared exponential kernel and the last is also known as the exponential,

Laplacian, or Ornstein-Uhlenbeck kernel.

We simulate data using the Karhunen-Loeve expansion, a common approach in FDA

simu-lation studies. In particular we take

Xi(t) =µ(t) +

(15)

eigenfunctions) and m was taken as the largest value such that λm was numerically different

than zero inR(usually aboutm= 50). The remaining parameters will be fixed in all scenarios,

except for the one where they are explicitly varied to consider their effect.

• The penalty parameter we take to beφ= 0.001, except in Scenario 1.

• The range parameter for the kernel used to define K we take to be ρ = 0.001, except in Scenario 2.

• The covariance function for the noise will beK1 withρ= 0.2, except in Scenario 3. In all settings the RKHS kernel and the noise kernel will be taken to be the same.

• The smoothness parameter for the data will be taken asp= 4, except in Scenario 4.

• The DP parameters will be set atα= 1 and β = 0.1, except in Scenario 5.

• The sample size will be taken asN = 25, except in Scenario 6.

• The mean function will be taken asµ(t) = 0.1 sin(πt), except Scenario 7.

All of the curves are generated on an equally spaced grid between 0 and 1, with 100 points.

In all of these settings the risk is fixed by choosing theα and β in the definition of DP. We

thus focus on the utility of the privacy enhanced curves by comparing them graphically to the

original estimates and the data. Ideally, the original estimate will be close to the truth and

the privacy enhanced version will close to the original estimate. What we will see is that by

compromising slightly on the former, one can makes substantial gains in the latter.

Scenario 1: Varying penalty term φ

In this setting all of the previously discussed defaults are used except the penalty,φ, that varies

from 10−6_{to 1. In Figure 1 we plot all of the generated curves in grey, the RKHS smoothed mean}

in green, and the privacy enhanced version in red. We can see that as the penalty increases,

both estimates shrink towards each other and to zero. There is a clear “sweet spot” in terms

of utility, where the smoothing has helped reduce the amount of noise one has to add to the

(16)

0.0 0.2 0.4 0.6 0.8 1.0

−0.15

−0.05

0.05

0.15

φ=10−6

sample functions original summary private summary

0.0 0.2 0.4 0.6 0.8 1.0

−0.15

−0.05

0.05

0.15

φ=0.001

0.0 0.2 0.4 0.6 0.8 1.0

−0.15

−0.05

0.05

0.15

φ=0.1

0.0 0.2 0.4 0.6 0.8 1.0

−0.15

−0.05

0.05

0.15

φ=1

Figure 1: Original and private RKHS smoothing mean with Gaussian Kernel (K1) for different values of penalty parameter φ

Scenario 2: Varying kernel range parameter ρ

Here all defaults are used except the range parameter for the noise and RKHS (which are taken

to be the same in all settings) that ranges from 0.002 to 2. The results are presented in Figure

2. We see very similar patterns to Scenario 1, where increasing ρ increases the smoothing of

both the estimate and its privacy enhanced version. However, increasing ρ smooths more than

it shrinks the estimates and there is still a non-negligible difference between the two estimates

for larger values. Practically, bothρ andφshould be chosen together for the best performance,

which we will explore further in Section 6.

Scenario 3: Varying the kernel function k(t, s)

Here we consider the four different kernels given in (4) for both the noise and RKHS kernel

(17)

0.0 0.2 0.4 0.6 0.8 1.0

−0.15

−0.05

0.05

0.15

ρ=0.002

0.0 0.2 0.4 0.6 0.8 1.0

−0.15

−0.05

0.05

0.15

ρ=0.02

0.0 0.2 0.4 0.6 0.8 1.0

−0.15

−0.05

0.05

0.15

ρ=0.2

0.0 0.2 0.4 0.6 0.8 1.0

−0.15

−0.05

0.05

0.15

ρ=2

Figure 2: Original and private RKHS smoothing mean with Gaussian Kernel (K1) for different values of kernel range parameter ρ

roughly the same pattern, however,K1 produces curves which are infinitely differentiable, while

the exponential kernel produces curves that are nowhere differentiable (they follow an

Ornstein-Uhlenbeck process). The two Mat´ern covariances give paths that have either one (K3) or two

(K2) derivatives. Since the underlying function to be estimated is already very smooth, the

kernel does not have a substantial impact. However, for more irregular shapes, this choice can

play a substantial role on the efficiency of the resulting RKHS estimate.

Scenario 4: Varying the smoothing parameter of samples p

In this setting we vary p from 1.1 to 4, which determines the smoothness of the data, Xn(t).

Figure 4 summarizes these results. As we can see, the smoothness of the curves has a smaller

impact on the utility of the privacy enchanced versions as compared to other parameters.

(18)

0.0 0.2 0.4 0.6 0.8 1.0

−0.15

−0.05

0.05

0.15

Gaussian (K1)

0.0 0.2 0.4 0.6 0.8 1.0

−0.15

−0.05

0.05

0.15

Matern 5/2 (K2)

0.0 0.2 0.4 0.6 0.8 1.0

−0.15

−0.05

0.05

0.15

Matern 3/2 (K3)

0.0 0.2 0.4 0.6 0.8 1.0

−0.15

−0.05

0.05

0.15

Exponential (K4)

Figure 3: Original and private RKHS smoothing mean for different kernels

seen. As the curves become smoother, the global sensitivity decreases implying the need for less

noise being added in order to maintain the desired privacy level, and thus resulting in a higher

utility for the privacy enhanced curves. However, the smoothness, in terms of derivatives, of

the estimates is not affected, as this is determined by the kernel.

Scenario 5: Varying the privacy parameters (α, β)

In this setting we vary the privacy parameters,α and β. Figure 5 present the effects of varying

αfrom 5 to 0.1 while in Figure 6 we varyβ from 0.1 to 10−6. As we decrease the parameters, we

are requiring a stricter form of privacy, which is reflected in the plots; recall thatβ= 0 will give

us the stricter form of DP,α-DP. As we decrease these values, the overall noise added increases,

and we expect larger deviation of the privacy enhanced curve from the mean. As expected,

there is less sensitivity in the output to changes in β than toα. However, as with the previous

(19)

0.0 0.2 0.4 0.6 0.8 1.0

−0.2

−0.1

0.0

0.1

0.2

p=1.1

0.0 0.2 0.4 0.6 0.8 1.0

−0.15

−0.05

0.05

0.15

p=2

0.0 0.2 0.4 0.6 0.8 1.0

−0.15

−0.05

0.05

p=3

0.0 0.2 0.4 0.6 0.8 1.0

−0.10

0.00

0.10

p=4

Figure 4: Original and private RKHS smoothing mean with Gaussian Kernel (K1) for different values of smoothing parameter of samples p

resulting estimates.

Scenario 6: Varying sample size N

In Figure 7 we vary the sample size from 5 to 100. The results are very similar to changingβ and

α, as the sample size does not influence the smoothness of the curves (in terms of derivatives),

but the accuracy of the estimate (green curve) gets much better and so does the utility of the

(20)

0.0 0.2 0.4 0.6 0.8 1.0

−0.15

−0.05

0.05

0.15

α=5

0.0 0.2 0.4 0.6 0.8 1.0

−0.15

−0.05

0.05

0.15

α=1

0.0 0.2 0.4 0.6 0.8 1.0

−0.15

−0.05

0.05

0.15

α=0.5

0.0 0.2 0.4 0.6 0.8 1.0

−0.20

−0.05

0.05

0.15

α=0.1

Figure 5: Original and private RKHS smoothing mean with Gaussian Kernel (K1) for different values of privacy level parameter α when β = 1

Scenario 7: Different underlying mean function µ

Lastly, in Figure 8 we consider three additional mean functions. Overall, the actual function

being estimated does not influence the utility of the privacy enhanced version, only the accuracy

of the original estimate. This is because the noise to be added is computed from the different

smoothing parameters as well as the range of the L2 norm of the data, not the underlying

estimate itself.

6 Application: Diffusion Tensor Imaging

Regression with Functional Data or refund(Huang et al., 2016) is an R package that includes

a variety of tools related to both linear and nonlinear regression for functional data. DTI or

(21)

0.0 0.2 0.4 0.6 0.8 1.0

−0.15

−0.05

0.05

0.15

β=0.1

0.0 0.2 0.4 0.6 0.8 1.0

−0.15

−0.05

0.05

0.15

β=0.01

0.0 0.2 0.4 0.6 0.8 1.0

−0.15

−0.05

0.05

0.15

β=0.001

0.0 0.2 0.4 0.6 0.8 1.0

−0.15

−0.05

0.05

0.15

β=10−6

Figure 6: Original and private RKHS smoothing mean with Gaussian Kernel (K1) for different values of privacy level parameter β when α= 1

profiles for the corpus callosum (CCA) and the right corticospinal tract (RCST) for patients

with Multiple Sclerosis as well as controls. This type of imaging data is becoming more and more

common and the privacy concerns can be substantial. For example, one would be concerned

with releasing the image of a face from a study, but images of the brain or other major organs

might also be sensitive, especially if the study is related to some complex disease such as cancer,

HIV, etc. Thus it is useful to illustrate how to produce privacy enhanced versions of statistics

such as mean functions. We focus on the cca data, which includes 382 patients measured at 93

equally spaced locations along the CCA, and use them to illustrate our proposed methodology.

Our aim is to release a privacy enhanced RKHS smoothing estimate of the mean function. We

consider only the kernelsK1,K3andK4which correspond to the Gaussian kernel, Mat´ern kernel

with ν = 3

(22)

0.0 0.2 0.4 0.6 0.8 1.0

−0.15

−0.05

0.05

N=5

0.0 0.2 0.4 0.6 0.8 1.0

−0.15

−0.05

0.05

N=10

0.0 0.2 0.4 0.6 0.8 1.0

−0.15

−0.05

0.05

0.15

N=25

0.0 0.2 0.4 0.6 0.8 1.0

−0.15

−0.05

0.05

0.15

N=100

Figure 7: Original and private RKHS smoothing mean with Gaussian Kernel (K1) for different sample sizes N

first is regularCross Validation, CV, and the second we callPrivate Cross Validation, PCV. In

CV we fix φ and then take the ρ that gives the minimum 10-fold cross validation score. We

do not select φ based on cross validation because, based on our observations, the minimum

score is always obtained at the minimum φ for this data. In PCV we take nearly the same

approach, however, when computing the CV score we take the expected difference between our

privacy enhanced estimate and the data from the left out fold. This expected difference is

computed using a Monte-Carlo approximation, with 1000 draws from the distribution of the

privacy enhanced estimate. We then find both the φand ρ which give the optimal PCV score

based on a grid search.

We begin by presenting the results for CV. For each of the kernels, we fixed a value for

φ_{∈ {}0.0001,0.001,0.01,0.03_} and then vary theρ between [0.01,2]. Table 1 shows the optimal

(23)

0.0 0.2 0.4 0.6 0.8 1.0

−0.15

−0.05

0.05

0.15

0.1sin(2πt)

0.0 0.2 0.4 0.6 0.8 1.0

0.00

0.05

0.10

0.15

0.1sin(πt)

0.0 0.2 0.4 0.6 0.8 1.0

0.00

0.05

0.10

0.1t

0.0 0.2 0.4 0.6 0.8 1.0

−0.05

0.05

0.15

0.25

(t−0.5)2

Figure 8: Original and private RKHS smoothing mean with Gaussian Kernel (K1) for different initial mean functions µ

Table 1 to produce privacy enhanced estimates. The results are given in Figures 9, 10 and 11,

for exponential, Matern, and Gaussian, respectively. In each case, we see that the utility of

the privacy enhanced versions increases asφincreases, however, the largest values ofφproduce

estimates that are over smoothed. There is a good trade-off between privacy and utility with

φ= 0.01 and K3 or K1 (see figures 10 and 11), while K4 is a bit rougher and doesn’t perform

as well (e.g., Figure 9).

Turning to PCV, Table 2 shows the range we varied the (φ, ρ) and also the optimums which

give the minimum PCV. We use the optimal parameters to generate privacy enhanced estimates,

given in Figure 12. Here we see that the utility of the privacy enhanced estimates is excellent,

especially forK1. Using PCV tends to over smooth the original estimates (green lines), however,

(24)

Exp. Kernel Mat3/2 Kernel Gau. Kernel

φ optimum ρ optimum ρ optimum ρ

1 0.0001 0.25 0.25 0.01

2 0.001 0.20 0.20 0.01

3 0.01 0.30 0.30 0.03

4 0.03 0.80 0.80 0.05

Table 1: Optimum kernel range parameters ρ for different kernels with using CV for each fixed penalty parameter φin DTI dataset

0.0 0.2 0.4 0.6 0.8 1.0

Figure 9: Mean estimate for CCA and its private release with Exponential kernel (K4) using CV.

7 Conclusions

In this work we have provided a mechanism for achieving (α, β)-DP for a wide range of

sum-maries related to functional parameter estimates. This work expands dramatically upon the

work of Hall et al. (2013), who explored this topic in the context of point-wise releases of

func-tions. Our work covers theirs as a special case, but also includes full function releases and

nonlinear releases, that have not been considered before in statistical privacy literature. In

(25)

0.0 0.2 0.4 0.6 0.8 1.0

Figure 10: Mean estimate for CCA and its private release using Mat´ern(3/2) kernel (K3) with CV.

0.0 0.2 0.4 0.6 0.8 1.0

(26)

Kernel range φ range ρ optimum φ optimum ρ K4 [10−4,0.1] [0.2,1] 0.01 0.466

K3 [10−4,0.1] [0.05,0.5] 0.01 0.450

K₁ [10−4_,₀_._{1] [0}_.₀₁_,₀_._1] _0.005 _0.030

Table 2: Optimum penalty and range parameters (φ, ρ) for different kernels with PCV in CCA application.

0.0 0.2 0.4 0.6 0.8 1.0

Figure 12: Mean estimate of CCA and its private release for Exponential (K4), Mat´ern(3/2) (K3) and Gaussian kernels (K1) using PCV.

a study may collect and analyze many pieces of information such as genomic sequence data,

biomarker data, biomedical images, which all alone or if linked and/or combined with each other

or other demographic information, lead to greater disclosure risk albeit one that still requires

thorough understanding itself; for a most recent example of potential identification risks related

to functional data, see (Lippert et al., 2017)

The heart of our work utilizes densities for functional data in a way that has not yet been

explored in the literature. Previously, densities for functional data were thought not to exist

(Delaigle and Hall, 2010) and researchers instead relied on various approximations to densities.

Here we show that densities do in fact exist and we utilize them in our theoretical results.

(27)

the heart of this issue is theorthogonality of probability measures; in infinite dimensional spaces

dealing with the orthogonality of measures becomes a major issue, which has a direct impact

on the utility of density based methods for FDA.

The literature on privacy for scalar and multivariate data is quite extensive, while there has

been very little work done for FDA. Therefore, there are many opportunities for developing

additional theory and methods for functional data and estimates. One issue that we believe

will be especially important is the role of smoothing and regularization in preserving the utility

of privacy enhanced releases. As we have seen, a bit of extra smoothing can go a long way in

terms of maintaining privacy.

References

F. Ald`a and B. I. Rubinstein. The bernstein mechanism: Function release under differential privacy. In AAAI, pages 1705–1711, 2017.

G. Baxter. A strong limit theorem for gaussian processes. Proceedings of the American

Mathe-matical Society, 7(3):522–527, 1956.

A. Berlinet and C. Thomas-Agnan. Reproducing kernel Hilbert spaces in probability and statis-tics. Springer Science & Business Media, 2011.

F. Black and M. Scholes. The pricing of options and corporate liabilities. The journal of political

economy, pages 637–654, 1973.

E. Bongiorno and A. Goia. Classification methods for hilbert data based on surrogate density.

arXiv preprint arXiv:1506.03571, 2015.

R. Cameron and W. Martin. Transformations of wiener integrals under a general class of linear transformations. Transactions of the American Mathematical Society, 58(2):184–219, 1945.

R. H. Cameron and W. T. Martin. Transformations of weiner integrals under translations.

Annals of Mathematics, pages 386–396, 1944.

R. H. Cameron and W. T. Martin. The behavior of measure and measurability under change of scale in wiener space. Bulletin of the American Mathematical Society, 53(2):130–137, 1947.

K. Chaudhuri, C. Monteleoni, and D. Sarwate. Differentially private empirical risk minimization.

InJournal of Machine Learning Research, volume 12, pages 1069–1109, 2011.

T. Dalenius. Statistik Tidskrift, 15:429–444, 1977.

A. Delaigle and P. Hall. Defining probability density for a distribution of random functions.

(28)

C. Dwork. Differential privacy. In 33rd International Colloquium on Automata, Languages

and Programming, part II (ICALP 2006), volume 4052, pages 1–12, Venice, Italy, July 2006.

Springer Verlag. ISBN 3-540-35907-9. URLhttps://www.microsoft.com/en-us/research/

publication/differential-privacy/.

C. Dwork and A. Roth. The algorithmic foundations of differential privacy. Found. Trends

Theor. Comput. Sci., 9(3–4):211–407, Aug. 2014. ISSN 1551-305X. doi: 10.1561/

0400000042. URL http://dx.doi.org/10.1561/0400000042.

C. Dwork, K. Kenthapadi, F. McSherry, I. Mironov, and M. Naor. Our data, ourselves: Privacy via distributed noise generation. InProceedings of the 24th Annual International Conference

on The Theory and Applications of Cryptographic Techniques, EUROCRYPT’06, pages 486–

503, Berlin, Heidelberg, 2006a. Springer-Verlag. ISBN 3-540-34546-9, 978-3-540-34546-6. doi: 10.1007/11761679 29. URLhttp://dx.doi.org/10.1007/11761679_29.

C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating Noise to Sensitivity in Private

Data Analysis, pages 265–284. Springer Berlin Heidelberg, Berlin, Heidelberg, 2006b. ISBN

978-3-540-32732-5. doi: 10.1007/11681878 14. URLhttps://doi.org/10.1007/11681878_

14.

C. Dwork, A. Roth, et al. The algorithmic foundations of differential privacy. Foundations and

Trends® in Theoretical Computer Science, 9(3–4):211–407, 2014.

Y. Erlich and A. Narayanan. Routes for breaching and protecting genetic privacy. Nature

Reviews Genetics, 15(6):409–421, 2014.

J. Feldman. Equivalence and perpendicularity of gaussian processes. Pacific J. Math, 8(4): 699–708, 1958.

F. Ferraty and Y. Romain. The Oxford Handbook of Functional Data Analaysis. Oxford Uni-versity Press, 2011.

S. Fienberg and A. Slavkovi´c. Data Privacy and Confidentiality, pages 342–345. International Encyclopedia of Statistical Science. Springer-Verlag, 2010.

I. V. Girsanov. On transforming a certain class of stochastic processes by absolutely continuous substitution of measures. Theory of Probability & Its Applications, 5(3):285–301, 1960.

U. Grenander. Stochastic processes and statistical inference. Arkiv f¨or matematik, 1(3):195–277, 1950.

R. Hall, A. Rinaldo, and L. Wasserman. Differential privacy for functions and functional data.

The Journal of Machine Learning Research, 14(1):703–727, 2013.

L. Horv´ath and P. Kokoszka. Inference for functional data with applications, volume 200. Springer Science & Business Media, 2012.

L. Huang, F. Scheipl, J. Goldsmith, J. Gellar, J. Harezlak, M. W. McLean, B. Swihart, L. Xiao, C. Crainiceanu, and P. Reiss. refund: Regression with Functional Data, 2016. URLhttps:

//CRAN.R-project.org/package=refund. R package version 0.1-14.

(29)

I. A. Ibragimov and Y. A. Rozanov. Gaussian random processes, volume 9. Springer Science & Business Media, 1978.

S. Kakutani. On equivalence of infinite product measures. Annals of Mathematics, pages 214– 224, 1948.

D. Kifer, A. Smith, and A. Thakurta. Private convex empirical risk minimization and high-dimensional regression. In S. Mannor, N. Srebro, and R. C. Williamson, editors,Proceedings

of the 25th Annual Conference on Learning Theory, volume 23 of Proceedings of Machine

Learning Research, pages 25.1–25.40, Edinburgh, Scotland, 25–27 Jun 2012. PMLR. URL

http://proceedings.mlr.press/v23/kifer12.html.

P. Kokoszka and M. Reimherr. Introduction to functional data analysis. CRC Press, 2017.

J. Kulynych. Legal and ethical issues in neuroimaging research: human subjects protection, medical privacy, and the public communication of research results. Brain and cognition, 50 (3):345–357, 2002.

R. Laha and V. Rohatgi. Probability Theory. Wiley, New York, 1979.

J. Lane, V. Stodden, S. Bender, and H. Nissenbaum. Privacy, big data, and the public good:

Frameworks for engagement. Cambridge University Press, 2014.

C. Lippert, R. Sabatini, M. C. Maher, E. Y. Kang, S. Lee, O. Arikan, A. Harley, A. Bernal, P. Garst, V. Lavrenko, K. Yocum, T. Wong, M. Zhu, W.-Y. Yang, C. Chang, T. Lu, C. W. H. Lee, B. Hicks, S. Ramakrishnan, H. Tang, C. Xie, J. Piper, S. Brewerton, Y. Turpaz, A. Telenti, R. K. Roby, F. J. Och, and J. C. Venter. Identification of indi-viduals by trait prediction using whole-genome sequencing data. Proceedings of the

Na-tional Academy of Sciences, 114(38):10166–10171, 2017. doi: 10.1073/pnas.1711125114. URL

http://www.pnas.org/content/114/38/10166.abstract.

O. Nikodym. Sur une généralisation des intégrales de mj radon. Fundamenta Mathematicae, 15 (1):131–179, 1930.

J. Radon. Theorie und anwendungen der absolut-additiven mengenfunktionen. Sitzungs-berichte der mathematisch-naturwissenschaftlichen Klasse der Kaiserlichen Akademie der

Wissenschaften, 122:1295–1438, 1913.

J. O. Ramsay and B. W. Silverman. Applied functional data analysis: methods and case studies, volume 77. Springer New York, 2002.

J. O. Ramsay and B. W. Silverman. Functional data analysis. Springer New York, 2005.

C. R. Rao and V. Varadarajan. Discrimination of gaussian processes. Sankhy¯a: The Indian

Journal of Statistics, Series A, pages 303–330, 1963.

Y. A. Rozanov. On the density of one gaussian measure with respect to another. Theory of

Probability & Its Applications, 7(1):82–87, 1962.

Y. A. Rozanov. On probability measures in functional spaces corresponding to stationary gaussian processes. Theory of Probability & Its Applications, 9(3):404–420, 1964.

(30)

E. E. Schadt, S. Woo, and K. Hao. Bayesian method to predict individual snp genotypes from gene expression data. Nature genetics, 44(5):603–608, 2012.

L. Shepp. The singularity of gaussian measures in function space. Proceedings of the National

Academy of Sciences, 52(2):430–433, 1964.

L. Shepp. Radon-nikodym derivatives of gaussian measures. The Annals of Mathematical

Statistics, pages 321–354, 1966.

A. Skorohod. On the densities of probability measures in functional spaces. Proc. Fifth Berkeley Sympos. Math. Statist. and Probability (Berkeley, Calif., 1965/66) Vol. II: Contributions to

Probability Theory, 1:163–182, 1967.

M. L. Stein. Interpolation of spatial data: some theory for kriging. Springer Science & Business Media, 2012.

D. Varberg. On equivalence of gaussian measures. Pacific Journal of Mathematics, 11(2): 751–762, 1961.

D. E. Varberg. On gaussian measures equivalent to wiener measure. Transactions of the

Amer-ican Mathematical Society, 113(2):262–273, 1964.

D. E. Varberg. On gaussian measures equivalent to wiener measure ii. Mathematica Scandinav-ica, 18(2):143–160, 1967.

L. Wasserman and S. Zhou. A statistical framework for differential privacy. Journal of the

American Statistical Association, 105(489):375–389, 2010. doi: 10.1198/jasa.2009.tm08651.

URLhttp://dx.doi.org/10.1198/jasa.2009.tm08651.

L. Willenborg and T. De Waal. Statistical disclosure control in practice. Number 111. Springer Science & Business Media, 1996.

A

Proofs

Lemma 1. If the privacy mechanism, f˜D, achieves (α, β)-DP, then so does any measurable

transformation of it.

Proof. Let T : Ω_→ Ω′ _{be a measurable transformation between two metric spaces (say with}

respect to the Borel σ-algebra). Then we have

P(T( ˜fD)∈A) =P( ˜fD ∈T−1(A))≤eαP( ˜fD′ ∈T−1(A)) +β=eαP(T( ˜f_D′)∈A) +β,

(31)

Proof of Theorem 1. The result will follow if we can show that

matrix can be expressed using the covariance operatorC, namely,

Kij =E[hZ, hiiHhZ, hjiH] =hC(hi), hjiH.

has by direct verification that

(32)

We can then swap from the inner product overHto one overK:

hfD−fD′, P1(fD−fD′)iH =hfD−fD′, C(P1(fD−fD′))iK=hfD−fD′, P(fD−fD′)iK.

If we show that P is a projection operator, i.e., symmetric and idempotent, we will have the

desired bound:

hfD−fD′, P(fD−fD′)iK≤ kfD−fD′k2_K.

First, P is idempotentbecause

P2 =

N

X

i=1

C(hi) N

X

j=1

(K−1)ij

*

hj, N

X

k=1

C(hk) N

X

l=1

(K−1)klhhl, .iH

+

H

=

N

X

i=1

C(hi) N

X

j=1

(K−1)ij N

X

k=1

hC(hk), hjiH

N

X

l=1

(K−1)klhhl, .iH

=

N

X

i=1

C(hi) N

X

l=1

(33)

Second, P issymmetric with respect with the Kinner product since

Hence P is a projection operator from _Hto_K, and the claim of the theorem holds.

Proof of Theorem 2. We aim to prove that dPD

dQ(h) = exp

the same probability measure. We will accomplish this by showing that both have the same

moment generating functional. Since Z is a Gaussian noise with mean zero and covariance

operatorC, ˜fD is a Gaussian process with meanfD and covariance operator δ2C. The moment

generating functional of ˜fD is then given by

(34)

On the other hand, the moment generating functional of P⋆ _is

So both have the same moment generating functional and are thus the same measure, which

proves the claim.

Proof of Theorem 3. We aim to show that for any measurable subsetA_⊂H we have

PD(A)≤eαPD′(A) +β.

Recall the global sensitivity for the functional case is

∆2 = sup

We equivalently aim to show that

(35)

We can express

α_{. Then trivially we have that}

PD(A) =PD(A∩K1) +PD(A∩K2).

Using the definition of K1 we have that

PD(A∩K1) =

The proof will be complete if we can show that

PD(A∩K2)≤β.

This is equivalent to showing that

(36)

where X_∼N(0, δ2_C_{). This can equivalently be stated as}

−₂1_δ₂(_−kfD−fD′k2_K−2hf_D −f_D′, X−f_Di_K)≥α

⇔hfD−fD′, Xi_K ≥δ2

α₋ 1

2δ2kfD′−fDk 2 K

⇔hC−1(fD−fD′), Xi_H ≥δ2

α₋ 1

2δ2kfD−fD′k 2 K

.

However _hX, C−1(fD−fD′)i_H is a normal random variable with mean zero and covariance

δ2hC(C−1(fD′−f_D)), C−1(f_D′−f_D)i_H=δ2kf_D′−f_Dk_K ≤δ2∆2.

So, if Z ∼N(0,1) then we have that

P

− 1

2δ2(−kfD −fD′k 2

K−2hfD−fD′, X−f_Di_K)≥α

≤P

δ∆Z ≥δ2

α− 1

2δ2kfD−fD′k 2 K

≤P

Z _≥ δ

∆

α₋ ∆

2

2δ2

=P Z _≥p2 log(2/β)₋ α 2p2 log(2/β)

!

≤P Z _≥p2 log(2/β)₋ 1 2p2 log(2/β)

!

≤β

as long as α_≤1 (Hall et al., 2013).

(37)

Proof of Theorem 4. The upper bound for ∆2

n is derived as following:

∆2_n= sup

B

Derivation of RKHS Estimate

Recall that

We can express gas

g(M) = 1

We then have that

g(M) = 1

(38)

Lemma 2. Let h and M be elements of a real separable Hilbert space H. Then we have that

1. _∂M∂ _hh, M_i_H=h.

2. _∂M∂ _hA(M), M_i_H= 2A(M),

where A is a symmetric bounded linear operator.

We then have that

∂g(M)

∂M =−2C( ¯X) + 2C(M) + 2φM.

Setting the above equal to zero we have that

C( ¯X) =C(ˆµ) +φM.

(7)

or

ˆ

µ= (C+φI)−1C( ¯X).

In addition, ˆµcan be written as ˆµ(t) =P∞

j=1hµ, vˆ jiHvj(t). So the properties (7) andhC(h1), h2iK =

hh1, h2iH imply that

hC( ¯X), vjiK=hC(ˆµ), vjiK+φhµ, vˆ jiK

hX, v¯ jiH =hµ, vˆ jiH+φhµ,ˆ

vj

λjiH

And eventually

hµ, vˆ jiH =

λj

λj+φh

¯

X(t), vj(t)iH

(39)

have

ˆ

µ(t) =

∞

X

j=1

hµˆ(t), vj(t)iHvj(t)

=

∞

X

j=1

λj

λj+φh

¯

X(t), vj(t)iHvj(t)

= 1

N

X

i=1

∞

X

j=1

λj

λj+φ