On the Existence of Densities for Functional Data and their
Link to Statistical Privacy
Ardalan Mirshani
Matthew Reimherr
∗Aleksandra Slavkovi´c
Abstract
In statistical privacy (or statistical disclosure control) the goal is to minimize the potential for identification of individual records or sensitive characteristics while at the same time ensur-ing that the released information provides accurate and valid statistical inference. Differential Privacy, DP, has emerged as a mathematically rigorous definition of risk and more broadly as a framework for releasing privacy enhanced versions of a statistical summary. This work develops an extensive theory for achieving DP with functional data or function valued parameters more generally. Functional data analysis, FDA, as well as other branches of statistics and machine learning often deal with function valued parameters. Functional data and/or functional param-eters may contain unexpectedly large amounts of personally identifying information, and thus developing a privacy framework for these areas is critical in the era of big data. Our theoretical framework is based on densities over function spaces, which is of independent interest to FDA researchers, as densities have proven to be challenging to define and utilize for FDA models. Of particular interest to researchers working in statistical disclosure control, we demonstrate how even small amounts of over smoothing or regularizing can produce releases with substantially improved utility. We carry out extensive simulations to examine the utility of privacy enhanced releases and consider an application to Multiple Sclerosis and Diffusion Tensor Imaging.
KEYWORDS: Differential Privacy, Functional Data, Hilbert Space, Density Estimation, Re-producing Kernel Hilbert Space
1
Introduction
New studies, surveys, and technologies are resulting in ever richer and more informative data
sets. Data being collected as part of the “big data revolution” occurring in the sciences have
dramatically expanded the pace of scientific progress over the last several decades, but often
∗Address for correspondence: Matthew Reimherr, Department of Statistics, Pennsylvania State University, 411
Thomas Building, University Park, PA 16802, USA. E-mail: [email protected]
contain a significant amount of personal or subject level information. These data and their
cor-responding analyses present substantial challenges for preserving subjects’ privacy as researchers
attempt to understand what information can be publicly released without impeding scientific
advancement and policy making (Lane et al., 2014).
One type of big data that has been heavily researched in the statistics community over the
last two decades isfunctional data, with the corresponding branch of statistics calledfunctional
data analysis, FDA. FDA is concerned with conducting statistical inference on samples of
func-tions, trajectories, surfaces, and other similar objects. Such tools have become increasingly
necessary as our data gathering technologies become more sophisticated. FDA methods have
proven very useful in a wide variety of fields including economics, finance, genetics, geoscience,
anthropology, and kinesiology, to name only a few (Ramsay and Silverman, 2002, 2005;
Fer-raty and Romain, 2011; Horv´ath and Kokoszka, 2012; Kokoszka and Reimherr, 2017). Indeed,
nearly any data rich area of science will eventually come across applications that are amenable
to FDA techniques. However, functional data are also a rich source of potentially personally
identifiable information (e.g., genomic data, including genomic sequences Erlich and Narayanan
(2014); Schadt et al. (2012), biomedical imaging Kulynych (2002), etc.).
To date, there has been very little work concerning FDA and statistical data privacy, that
is, statistical disclosure control (SDC) or statistical disclosure limitation (SDL), the branch of
statistics concerned with limiting identifying information in released data and summaries while
maintaining their utility for valid statistical inference. SDC has a rich history for both
method-ological developments and applications for “safe” release of altered (or masked) microdata and
tabular data, e.g., (Dalenius, 1977; Rubin, 1993; Willenborg and De Waal, 1996; Fienberg and
Slavkovi´c, 2010; Hundepool et al., 2012), for general review of the area and the related
method-ology. Hall et al. (2013) provide the most substantial contribution to privacy of FDA data to
date, where the authors are primarily interested in nonparametric smoothing, releasing
point-wise evaluations of functions with a particular focus on kernel density estimators. They work
within the framework of Differential Privacy, DP, that has emerged from theoretical computer
risk (Dwork, 2006; Dwork et al., 2006b). One of the major findings of Hall et al. (2013) is
the connection between DP and Reproducing Kernel Hilbert Spaces. This connection will be
explored here with even greater depth. Recently Ald`a and Rubinstein (2017) extended this
work by considering a Laplace (instead of a Gaussian) mechanism and focused on releasing an
approximation based on Bernstein polynomials, exploiting their close connection to point-wise
evaluation on a grid or mesh.
In this work, we move beyond Hall et al. (2013) by establishing DP for functional data more
broadly. We propose a mechanism for achieving DP for releasing any set of linear functionals of
a functional object, which include point-wise releases as a special case. We will show that this
mechanism also protects against any nonlinear functions of the functional object or even a full
function release. To establish the necessary probabilistic bounds for DP we utilize functional
densities. This represents a major step for FDA as densities for functional data are sometimes
thought not to exist (Delaigle and Hall, 2010). Lastly, we demonstrate these tools by estimating
the mean function via RKHS smoothing.
One of the major findings of this work is the interesting connection betweenregularization
and privacy. In particular, we show that by slightly over smoothing, one can achieve DP with
substantially less noise, thus better preserving the utility of the release. It turns out that a great
deal of personal information can reside in the “higher frequencies” of a functional parameter
estimate. To more fully illustrate this point, we demonstrate how a cross-validation for choosing
smoothing parameters can be dramatically improved when the cross-validation incorporates the
function to be released.Previous works concerning DP and regularization have primarily focused
on performing shrinkage regression in a DP manner (e.g., Kifer et al. (2012); Chaudhuri et al.
(2011)), but not using the smoothing to recover some utility as we propose here.
The remainder of the paper is organized as follows. In Section 2 we present the necessary
background material for DP and FDA. In Section 3 we present our primary results concerning
releasing a finite number of linear functionals followed by full function and nonlinear releases. We
also present our work related to densities for FDA, which is of independent interest. In Section
mechanism. In Section 5 we present simulations to highlight the role of different parameters,
while in Section 6 we present an application to Diffusion Tensor Imaging of Multiple Sclerosis
patients. In Section 7 we discuss our results and present concluding remarks. All mathematical
proofs and derivations can be found in the Appendix.
2
Background
2.1
Differential Privacy
LetDbe a countable (potentially infinite) population of records, and denote byDthe collection
of all n-dimensional subsets of D. Throughout we let D and D′ denote elements of D.
Nota-tionally, we omit the dependence on nfor ease of exposition. Differntial Privacy, DP, was first
introduced by Dwork (2006); Dwork et al. (2006b) as a criteria for evaluating the efficacy of a
privacy mechanism. In this work we use a form called (α, β)−DP (also commonly referred to
as (ǫ, δ)−DP), where α and β are privacy parameters with smaller values indicating stronger
privacy. DP is a property of the privacy mechanism applied to the data summary, in this case
fD := f(D), prior to release. For simplicity, we will denote the application of this mechanism
using a tilde; so ˜fD := ˜f(D) is the privacy enhanced version of fD. Probabilistically, we view
˜
fD as a random variable indexed by D (which is not treated as random). This criteria can be
define for any probability space.
Definition 1. Let f : D → Ω, where Ω is some metric space. Let f˜D be random variables,
indexed by D, taking values in Ωand representing the privacy mechanism. The privacy
mecha-nism is said to achieve (α, β)−DP if for any two datasets, D and D′, which differ in only one
record, we have
P( ˜fD ∈A)≤P( ˜fD′ ∈A)eα+β,
for any measurable set A.
The most common setting is when Ω =R, i.e., the summary is real valued. In Section 3.1
we will consider the case when Ω =H, a real separable Hilbert space. At a high level, achieving
(α, β)−DP means that the object to be released changes relatively little if the sample on which it
is based is perturbed slightly. This change is related to what Dwork (2006) called “Sensitivity”.
Another nice feature is that if ˜fD achieves DP, then so does any measurable transformation
of it; see Dwork et al. (2006a,b) for the orginal results, Wasserman and Zhou (2010) for its
statistical framework, and Dwork and Roth (2014) for a more recent detailed review of relevant
DP results. For convenience, we restate the lemma and its proof with respect to our notation
in Appendix A.
2.2
Functional Data Analysis
Here we provide a few details concerning FDA that will be important later on. Throughout we
let H denote a real separable Hilbert space, i.e., a complete inner product space that admits a
countable basis. The most commonly encountered space in FDA is L2[0,1], though many other
spaces are becoming of increasing interest. For example, multivariate or panel functional data,
functional data over multidimensional domains (such as space-time), as well as spaces which
incorporate more structure on the functions, such as Sobolev spaces and Reproducing Kernel
Hilbert Spaces.
Throughout, we will let θ : D → H denote the particular summary of interest and for
notational ease, we define θD := θ(D). In Section 3.1 we consider the case where we want to
release a finite number of continuous linear functionals ofθD, where as in Section 3.2 we consider
releasing a privacy enhanced version of the entire function or some nonlinear transformation of
it (such as a norm).
The backbone of our privacy mechansim is the same as in Hall et al. (2013), and is used
extensively across the DP literature. In particular, we will add Gaussian noise to the summary
and show how the noise can be calibrated to achieve DP. We sayZ is a Gaussian process inHif
hZ, hiH is Gaussian inR, for anyh∈H. Every Gaussian process inHis uniquely parametrized
and nuclear)C :H→H(Laha and Rohatgi, 1979, Page 488). One then has that
hZ, hiH∼ N(hµ, hiH,hC(h), hiH),
for anyh ∈H. We use the short hand notationN to denote the Gaussian distribution over R,
but include subscripts for any other space, e.g., NH forH.
Let K be a symmetric, positive definite, bounded linear operator over H, that admits a
spectral decomposition given by
K =
∞
X
i=1
λivi⊗vi,
(1)
where ⊗ denotes the tensor product between two elements of H, the result of which is an
operator, (x⊗y)(h) := hy, hiHx. The λi and vi denote the eigenvalues and eigenfunctions,
respectively, of K. Without loss of generality, one can assume that thevi form an orthonormal
basis of H. The operator K can be used to define a subspace of Hthat is also a Hilbert space.
In particular,
K:=
(
h∈H:
∞
X
i=1
hh, vii2
λi
<∞
)
.
(2)
It is usually assumed that λi 6= 0, however, the case whereλi = 0 for someiis readily included
with the convention 0/0 = 0 in the above summation. In that case, it is understood that if
λi = 0 then hh, vii= 0 for all h ∈ K. For example, if only a finite number of λi were nonzero
then K would be finite dimensional. The space K is a Hilbert space when equipped with the
inner product hx, yiK =P
λ−i 1hx, viihy, vii. When H =L2[0,1] and K is an integral operator
with kernel k(t, s), then K is also a Reproducing Kernel Hilbert Space, RKHS (Berlinet and
3
Privacy Enhanced Functional Data
In this section we present our main results. The mechanism we use for guaranteeing differential
privacy is to add a Gaussian noise before releasing θD. In other words, our releases will be
based on a private version ˜θD, where ˜θD = θD +δZ. However, it turns out that not just any
Gaussian noise,Z, can be used. In particular, the options for choosingZ depend heavily on the
summary θD. This is made explicit in the following definition.
Definition 2. We say that the summaryθD is compatible with a Gaussian noise,Z ∼ NH(0, C),
if θD ∈K for any D∈ D, where Kis defined in (2) with kernel K =C.
In other words, for a particular noise to be viable for the privacy mechanism, it must be the
case that the summary resides in the weighted subspace derived from the covariance operator
of the noise. Recall that every covariance operator is compact, symmetric, and positive definite
and thus admits the expansion (1). Our next definition comes from Hall et al. (2013), but we
state it here more generally.
Definition 3. The global sensitivity of a summary θD, with respect to a Gaussian noise Z ∼
NH(0, C) is given by
∆2= sup
D′∼Dk
θD−θD′k2K,
where D′ ∼Dmeans the two sets differ at one record only. The space, K, is defined in (2)with
kernel K =C
The global sensitivity (GS) is a central quantity in the theory of DP. In particular, the
amount of noise, δZ, depends directly on the global sensitivity. It should be noted that there
are other notions of “sensitivity” in the DP literature, such as local sensitivity, but here our
focus is on the global sensitivity that typically leads to the worst case definition of risk under
DP; for a detailed review of DP theory and concepts, see Dwork and Roth (2014). If a summary
is not compatible with a noise, then it is possible to make the global sensitivity infinite, in which
case, no finite amount of noise would be able to preserve privacy in the sense of satisfying DP.
lie in K, then no level of noise can be added that guarantees scalar DP holds uniformly across
hθD, hifor any h.
Example 1. Let Z be Brownian motion over [0,1], which means that the kernel of C is given
by
c(t, s) = min{t, s} t, s∈[0,1].
The kernel, c, admits the following spectral decomposition:
c(t, s) =
∞
X
j=1
2
(j−1/2)2π2 sin((j−1/2)πt) sin((j−1/2)πs),
λj =
1
(j−1/2)2π2, and vj(t) =
√
2 sin((j−1/2)πt).
Let Dconsist of the N iid realizations of the following process
Xi(t) =Yi ∞
X
j=1
√
2
(j−1/2)3/4π sin((j−1/2)πt), i= 1, . . . , N,
where Yi are iid standard normal. Then we have that
kXik2H =Yi2 ∞
X
j=1
1
(j−1/2)3/2π2 <∞ but kXik 2 K=Yi2
∞
X
j=1
(j−1/2)2
(j−1/2)3/2 =∞.
So the Xi are not in the RKHS,K defined using the kernelc(t, s). Now suppose that θD =X is
a single element ofD, and we aim to release a privacy enhance version offD :=hθD, vii. Using
classic results from univariate DP (e.g. Dwork et al., 2006a), a private version would be
hX, vjiH+δhZ, vjiH,
where
δ2j = 2 log(2/β)
α2 ∆ 2
j and ∆2j = sup i,k
hXi−Xk, vji2H hC(vj), vjiH
Here ∆j denotes the scalar global sensitivity of hX, vji with respect tohZ,vji. We can express
sup
i,k
hXi−Xk, vji2H hC(vj), vjiH
= sup
i,k
(Yi−Yk)2
(j−1/2)2
(j−1/2)3/2 = (j−1/2) 0/5sup
i,k
(Yi−Yk)2.
However, this quantity goes to infinity as j → ∞. Therefore, there is no level of noise, δ, we
can add to fD which would guarantee (α, β)-DP uniformly across the choice of h.
3.1
Releasing Finite Projections
We begin with the simpler task of releasing a finite vector of continuous linear functionals
of θD. By the Riesz Representation Theorem, every continuous linear functional, T(·), over
H is of the form T(·) = h·, hiH. Thus, equivalently, we aim to release a private version of
{hθD, h1iH, . . . ,hθD, hNiH} for some finite number N, and fixed{hi ∈H}Ni=1.
Theorem 1. Assume θD is compatible with Z∼ N(0, C), and define
˜
θD =θD +δZ with δ=
2 log(2/β)
α2 ∆ 2.
Now define f :D →RN as fD := {hθD, hii} and f˜D := {hθ˜D, hii}, for {hi ∈H}Ni=1. Then f˜D
achieves (α, β)-DP for any finite choice of N and any hi ∈H.
Theorem 1 can be viewed as an extension of Hall et al. (2013) who allow for point-wise
releases only. If the Hilbert spaceHis also taken to be an RKHS, then Theorem 1 implies
point-wise releases are protected as well, since in an RKHS point-point-wise evaluations are also continuous
linear functionals. However, this theorem allows the release of any functional. Additionally,
notice that while fD must lie in K, the hi can be any functions in H, which dramatically
increases the release options as compared to Hall et al. (2013) or Ald`a and Rubinstein (2017).
3.2
Full Function and Nonlinear Releases
While Section 3.1 covers a number of important cases, it does not cover all releases of potential
such as norms or derivatives. To guarantee privacy for these types of releases, we need to
establish (α, β)-DP for the entire function, not just finite projections. This means that in
Definition 1, the metric space is taken to be Hwhich is infinite dimensional.
In previous works (e.g. Dwork et al., 2014; Hall et al., 2013) to establish the probability
inequalities like in Definition 1, bounds based on multivariate normal densities were used. This
presents a serious problem for FDA as there is nearly no work in the literature of FDA which
employs densities. This is due to the great difficulty in defining them for Hilbert space processes.
For example, Delaigle and Hall (2010) are one of the few to tackle this problem, but get around
it by defining a density only for finite “directions” of functional objects. Another example is
Bongiorno and Goia (2015) who define psuedo-densities by carefully controlling “small ball”
probabilities. Both works claim that for a functional object the density “generally does not
exist.” However, this turns out to be a technically incorrect claim, while still often being true
in spirit. The correct statement is that, in general, it is difficult to define a useful density for
functional data. In particular, to work with likelihood methods, a family of probability measures
should all have a density with respect to the same base measure, which, at present, does not
appear to be possible in general for functional data.
The concept of defining densities over function spaces has a very deep history. Given its
potential importance in FDA, we will attempt to reference some of the major works. These
results have been used extensively in areas such as mathematical finance and spatial statistics.
Recall that a density is a Radon-Nikodym derivative (RND) of one measure with respect to
another (herein called the base measure). For probability models, the base measure is typically
Lebesgue or counting measure for continuous and discrete random variables respectively. RNDs
were first introduced in the works of Radon (1913) and Nikodym (1930). The existence of RNDs
depends on the absolute continuity of measures and thus a great deal of work that followed
focused on determining when infinite dimensional measures were equivalent or orthogonal; for
example, see Cameron and Martin (1944, 1945, 1947); Kakutani (1948); Grenander (1950);
Baxter (1956); Feldman (1958); Girsanov (1960); Varberg (1961); Rozanov (1964); Shepp (1964);
what is the density of one with respect to the other? This also has a very rich body of literature
including Rozanov (1962); Rao and Varadarajan (1963); Shepp (1966) and Skorohod (1967). In
the early 1970s Black and Scholes published their landmark paper on option pricing (Black and
Scholes, 1973), which utilized much of this theory, while Ibragimov and Rozanov published a
book (Ibragimov and Rozanov, 1978) summarizing much of the discussed work, especially as it
related to stationary Gaussian processes. Thereafter, much of the work has been focused more
on specific areas of application as the bulk of the foundational theory had been established.
Our goal in using densities is relatively straightforward. We aim to define densities (with
respect to the same base measure) for the family of probability measures induced by{f(D)+δZ:
D∈ Dn}, where Z is a mean zero Gaussian process in Hwith covariance operator C. It turns
out that this is in fact possible because we are adding the exact same type of noise to each
element. We give the following theorem, which can be viewed as a slight generalization and
reformulation of Theorem 6.1 from Rao and Varadarajan (1963).
Theorem 2. Assume that the summary θD is compatible with a noise Z. Denote by Q the
probability measure over H induced by δZ, and by {PD : D ∈ D} the family of probability
measures overHinduced byθD+δZ. Then every measure PD has a density overHwith respect
to Q, and the density is given by
dPD
dQ (h) = exp
− 1
2δ2 kfDk 2
K−2hh, fDiK
,
Q almost everywhere. This density is unique up to a set of Q measure zero.
Theorem 2 implies that, for any measurable setA⊂H we have
PD(A) =
Z
A
dPD
dQ (h)dQ(h).
To see why the density should be of the form given in Theorem 2, consider the density of
XJ = {hθD +δZ, vji}Jj=1, where vj is the jth eigenfunction of the covariance operator of Z.
{δ2λ
j}Jj=1, which are the corresponding eigenvalues. While one typically defines the density of
XJ with respect to Lebesgue measure, we can instead define it with respect to another normal,
namely ZJ = {hδZ, vji}Jj=1, which has the same distribution as XJ except with zero mean.
To define this density, we need only take the ratio of their densities with respect to Lebesgue
measure. This gives
with respect to J, and, by Parceval’s identity, we would obtain the expression in Theorem 2.
While a rigorous proof can be found in the appendix, this provides some intuition as to why
the final density would take such a form.
Now that we have a well defined density we can establish differential privacy for entire
functions.
Theorem 3. LetfD :=θD, and assume that it is compatible with a noiseZ, thenf˜D :=fD+δZ
achieves (α, β)-DP over H, where δ is defined in Theorem 1.
We also have the following corollary.
Corollary 1. Let f be compatible with a noise Z, and let T be any measurable transformation
from H→ S, where S is a metric space. Then T(fD+δZ) achieves(α, β)-DP over S, where δ
is defined in Theorem 1.
Together, Theorem 3 and Corollary 1 imply that the Guassian mechanism first introduced
by Hall et al. (2013) and expanded here, gives very broad privacy protection for functional data,
as nearly any transformation or manipulation of the privacy enhanced release is guaranteed to
maintain DP; this is known as a postporocessing property (e.g., see Dwork and Roth (2014)).
4
Global Sensitivity Bounds for RKHS Smoothing
In Sections 5 and 6 we illustrate how to produce private releases of mean function estimates
based on RKHS smoothing. Here we derive a corresponding bound on the global sensitivity to
be used generating the privacy enhanced estimates. There are of course a multitude of methods
for estimating smooth functions, however an RKHS approach is especially amenable to our
privacy mechanism. In this case we can define an RKHS smoother by using the kernel of the
noise, C, to define the space K, i.e.,K=C. The RKHS estimate of the meanµis then defined
as
ˆ
µ= argmin
M∈K
g(M) where g(M) = 1
N
N
X
i=1
kXi−Mk2H+φkMk2K,
where the φis calledpenalty parameter. Here, we can see the advantage of an RKHS approach
as it forces the estimate to lie in the space Kwhich means that the compatibility condition is
satisfied. A kernel different from the noise could be used, but one must be careful to make sure
that the the compatibility condition is met. If (λj, vj) are the eigenvalue/function pairs of the
kernel K and {Xi =P∞j=1xijvj :i= 1, . . . , N}, with xij =hXi, vji, then the estimate can be
expressed as
ˆ
µ= 1
N
N
X
i=1
∞
X
j=1
λj
λj+φ
xijvj,
(3)
see Appendix B for details. We then have the following result.
Theorem 4. If theHnorm of all elements of the population is bounded by a constant0< τ <∞
then the GS for RKHS estimation is bounded by
∆2n≤ 4τ 2
N2
∞
X
j=1
λj
(λj +φ)2
.
The resulting bound is practically very useful. One can rescale the data so that their H
bound is, for example, 1, and then the remaining quantities are all tied to the noise/RKHS
achieve DP can be constructed.
5
Empirical Study
In this section we present simulations with H = L2[0,1] to explore the impact of different
quantities on the utility of privacy enhanced releases. This exploration will occur via the problem
of estimating the mean function from a random sample of functional observations using RKHS
smoothing, as discussed in Section 4. The quantities we will explore include the kernel function
K(t, s), used to define K, the privacy parameters, (α, β), and the penalty parameter, φ, that
(combined with K) controls the level of smoothing.
For the RKHS, K, we consider four popular kernels:
K1(t, s) = exp
Each kernel is from the Mat´ern family of covariances (Stein, 2012), though the first is also known
as the Gaussian or squared exponential kernel and the last is also known as the exponential,
Laplacian, or Ornstein-Uhlenbeck kernel.
We simulate data using the Karhunen-Loeve expansion, a common approach in FDA
simu-lation studies. In particular we take
Xi(t) =µ(t) +
eigenfunctions) and m was taken as the largest value such that λm was numerically different
than zero inR(usually aboutm= 50). The remaining parameters will be fixed in all scenarios,
except for the one where they are explicitly varied to consider their effect.
• The penalty parameter we take to beφ= 0.001, except in Scenario 1.
• The range parameter for the kernel used to define K we take to be ρ = 0.001, except in Scenario 2.
• The covariance function for the noise will beK1 withρ= 0.2, except in Scenario 3. In all settings the RKHS kernel and the noise kernel will be taken to be the same.
• The smoothness parameter for the data will be taken asp= 4, except in Scenario 4.
• The DP parameters will be set atα= 1 and β = 0.1, except in Scenario 5.
• The sample size will be taken asN = 25, except in Scenario 6.
• The mean function will be taken asµ(t) = 0.1 sin(πt), except Scenario 7.
All of the curves are generated on an equally spaced grid between 0 and 1, with 100 points.
In all of these settings the risk is fixed by choosing theα and β in the definition of DP. We
thus focus on the utility of the privacy enhanced curves by comparing them graphically to the
original estimates and the data. Ideally, the original estimate will be close to the truth and
the privacy enhanced version will close to the original estimate. What we will see is that by
compromising slightly on the former, one can makes substantial gains in the latter.
Scenario 1: Varying penalty term φ
In this setting all of the previously discussed defaults are used except the penalty,φ, that varies
from 10−6to 1. In Figure 1 we plot all of the generated curves in grey, the RKHS smoothed mean
in green, and the privacy enhanced version in red. We can see that as the penalty increases,
both estimates shrink towards each other and to zero. There is a clear “sweet spot” in terms
of utility, where the smoothing has helped reduce the amount of noise one has to add to the
0.0 0.2 0.4 0.6 0.8 1.0
−0.15
−0.05
0.05
0.15
φ=10−6
sample functions original summary private summary
0.0 0.2 0.4 0.6 0.8 1.0
−0.15
−0.05
0.05
0.15
φ=0.001
sample functions original summary private summary
0.0 0.2 0.4 0.6 0.8 1.0
−0.15
−0.05
0.05
0.15
φ=0.1
sample functions original summary private summary
0.0 0.2 0.4 0.6 0.8 1.0
−0.15
−0.05
0.05
0.15
φ=1
sample functions original summary private summary
Figure 1: Original and private RKHS smoothing mean with Gaussian Kernel (K1) for different values of penalty parameter φ
Scenario 2: Varying kernel range parameter ρ
Here all defaults are used except the range parameter for the noise and RKHS (which are taken
to be the same in all settings) that ranges from 0.002 to 2. The results are presented in Figure
2. We see very similar patterns to Scenario 1, where increasing ρ increases the smoothing of
both the estimate and its privacy enhanced version. However, increasing ρ smooths more than
it shrinks the estimates and there is still a non-negligible difference between the two estimates
for larger values. Practically, bothρ andφshould be chosen together for the best performance,
which we will explore further in Section 6.
Scenario 3: Varying the kernel function k(t, s)
Here we consider the four different kernels given in (4) for both the noise and RKHS kernel
0.0 0.2 0.4 0.6 0.8 1.0
−0.15
−0.05
0.05
0.15
ρ=0.002
sample functions original summary private summary
0.0 0.2 0.4 0.6 0.8 1.0
−0.15
−0.05
0.05
0.15
ρ=0.02
sample functions original summary private summary
0.0 0.2 0.4 0.6 0.8 1.0
−0.15
−0.05
0.05
0.15
ρ=0.2
sample functions original summary private summary
0.0 0.2 0.4 0.6 0.8 1.0
−0.15
−0.05
0.05
0.15
ρ=2
sample functions original summary private summary
Figure 2: Original and private RKHS smoothing mean with Gaussian Kernel (K1) for different values of kernel range parameter ρ
roughly the same pattern, however,K1 produces curves which are infinitely differentiable, while
the exponential kernel produces curves that are nowhere differentiable (they follow an
Ornstein-Uhlenbeck process). The two Mat´ern covariances give paths that have either one (K3) or two
(K2) derivatives. Since the underlying function to be estimated is already very smooth, the
kernel does not have a substantial impact. However, for more irregular shapes, this choice can
play a substantial role on the efficiency of the resulting RKHS estimate.
Scenario 4: Varying the smoothing parameter of samples p
In this setting we vary p from 1.1 to 4, which determines the smoothness of the data, Xn(t).
Figure 4 summarizes these results. As we can see, the smoothness of the curves has a smaller
impact on the utility of the privacy enchanced versions as compared to other parameters.
0.0 0.2 0.4 0.6 0.8 1.0
−0.15
−0.05
0.05
0.15
Gaussian (K1)
sample functions original summary private summary
0.0 0.2 0.4 0.6 0.8 1.0
−0.15
−0.05
0.05
0.15
Matern 5/2 (K2)
sample functions original summary private summary
0.0 0.2 0.4 0.6 0.8 1.0
−0.15
−0.05
0.05
0.15
Matern 3/2 (K3)
sample functions original summary private summary
0.0 0.2 0.4 0.6 0.8 1.0
−0.15
−0.05
0.05
0.15
Exponential (K4)
sample functions original summary private summary
Figure 3: Original and private RKHS smoothing mean for different kernels
seen. As the curves become smoother, the global sensitivity decreases implying the need for less
noise being added in order to maintain the desired privacy level, and thus resulting in a higher
utility for the privacy enhanced curves. However, the smoothness, in terms of derivatives, of
the estimates is not affected, as this is determined by the kernel.
Scenario 5: Varying the privacy parameters (α, β)
In this setting we vary the privacy parameters,α and β. Figure 5 present the effects of varying
αfrom 5 to 0.1 while in Figure 6 we varyβ from 0.1 to 10−6. As we decrease the parameters, we
are requiring a stricter form of privacy, which is reflected in the plots; recall thatβ= 0 will give
us the stricter form of DP,α-DP. As we decrease these values, the overall noise added increases,
and we expect larger deviation of the privacy enhanced curve from the mean. As expected,
there is less sensitivity in the output to changes in β than toα. However, as with the previous
0.0 0.2 0.4 0.6 0.8 1.0
−0.2
−0.1
0.0
0.1
0.2
p=1.1
sample functions original summary private summary
0.0 0.2 0.4 0.6 0.8 1.0
−0.15
−0.05
0.05
0.15
p=2
sample functions original summary private summary
0.0 0.2 0.4 0.6 0.8 1.0
−0.15
−0.05
0.05
p=3
sample functions original summary private summary
0.0 0.2 0.4 0.6 0.8 1.0
−0.10
0.00
0.10
p=4
sample functions original summary private summary
Figure 4: Original and private RKHS smoothing mean with Gaussian Kernel (K1) for different values of smoothing parameter of samples p
resulting estimates.
Scenario 6: Varying sample size N
In Figure 7 we vary the sample size from 5 to 100. The results are very similar to changingβ and
α, as the sample size does not influence the smoothness of the curves (in terms of derivatives),
but the accuracy of the estimate (green curve) gets much better and so does the utility of the
0.0 0.2 0.4 0.6 0.8 1.0
−0.15
−0.05
0.05
0.15
α=5
sample functions original summary private summary
0.0 0.2 0.4 0.6 0.8 1.0
−0.15
−0.05
0.05
0.15
α=1
sample functions original summary private summary
0.0 0.2 0.4 0.6 0.8 1.0
−0.15
−0.05
0.05
0.15
α=0.5
sample functions original summary private summary
0.0 0.2 0.4 0.6 0.8 1.0
−0.20
−0.05
0.05
0.15
α=0.1
sample functions original summary private summary
Figure 5: Original and private RKHS smoothing mean with Gaussian Kernel (K1) for different values of privacy level parameter α when β = 1
Scenario 7: Different underlying mean function µ
Lastly, in Figure 8 we consider three additional mean functions. Overall, the actual function
being estimated does not influence the utility of the privacy enhanced version, only the accuracy
of the original estimate. This is because the noise to be added is computed from the different
smoothing parameters as well as the range of the L2 norm of the data, not the underlying
estimate itself.
6
Application: Diffusion Tensor Imaging
Regression with Functional Data or refund(Huang et al., 2016) is an R package that includes
a variety of tools related to both linear and nonlinear regression for functional data. DTI or
0.0 0.2 0.4 0.6 0.8 1.0
−0.15
−0.05
0.05
0.15
β=0.1
sample functions original summary private summary
0.0 0.2 0.4 0.6 0.8 1.0
−0.15
−0.05
0.05
0.15
β=0.01
sample functions original summary private summary
0.0 0.2 0.4 0.6 0.8 1.0
−0.15
−0.05
0.05
0.15
β=0.001
sample functions original summary private summary
0.0 0.2 0.4 0.6 0.8 1.0
−0.15
−0.05
0.05
0.15
β=10−6
sample functions original summary private summary
Figure 6: Original and private RKHS smoothing mean with Gaussian Kernel (K1) for different values of privacy level parameter β when α= 1
profiles for the corpus callosum (CCA) and the right corticospinal tract (RCST) for patients
with Multiple Sclerosis as well as controls. This type of imaging data is becoming more and more
common and the privacy concerns can be substantial. For example, one would be concerned
with releasing the image of a face from a study, but images of the brain or other major organs
might also be sensitive, especially if the study is related to some complex disease such as cancer,
HIV, etc. Thus it is useful to illustrate how to produce privacy enhanced versions of statistics
such as mean functions. We focus on the cca data, which includes 382 patients measured at 93
equally spaced locations along the CCA, and use them to illustrate our proposed methodology.
Our aim is to release a privacy enhanced RKHS smoothing estimate of the mean function. We
consider only the kernelsK1,K3andK4which correspond to the Gaussian kernel, Mat´ern kernel
with ν = 3
0.0 0.2 0.4 0.6 0.8 1.0
−0.15
−0.05
0.05
N=5
sample functions original summary private summary
0.0 0.2 0.4 0.6 0.8 1.0
−0.15
−0.05
0.05
N=10
sample functions original summary private summary
0.0 0.2 0.4 0.6 0.8 1.0
−0.15
−0.05
0.05
0.15
N=25
sample functions original summary private summary
0.0 0.2 0.4 0.6 0.8 1.0
−0.15
−0.05
0.05
0.15
N=100
sample functions original summary private summary
Figure 7: Original and private RKHS smoothing mean with Gaussian Kernel (K1) for different sample sizes N
first is regularCross Validation, CV, and the second we callPrivate Cross Validation, PCV. In
CV we fix φ and then take the ρ that gives the minimum 10-fold cross validation score. We
do not select φ based on cross validation because, based on our observations, the minimum
score is always obtained at the minimum φ for this data. In PCV we take nearly the same
approach, however, when computing the CV score we take the expected difference between our
privacy enhanced estimate and the data from the left out fold. This expected difference is
computed using a Monte-Carlo approximation, with 1000 draws from the distribution of the
privacy enhanced estimate. We then find both the φand ρ which give the optimal PCV score
based on a grid search.
We begin by presenting the results for CV. For each of the kernels, we fixed a value for
φ∈ {0.0001,0.001,0.01,0.03} and then vary theρ between [0.01,2]. Table 1 shows the optimal
0.0 0.2 0.4 0.6 0.8 1.0
−0.15
−0.05
0.05
0.15
0.1sin(2πt)
sample functions original summary private summary
0.0 0.2 0.4 0.6 0.8 1.0
0.00
0.05
0.10
0.15
0.1sin(πt)
sample functions original summary private summary
0.0 0.2 0.4 0.6 0.8 1.0
0.00
0.05
0.10
0.1t
sample functions original summary private summary
0.0 0.2 0.4 0.6 0.8 1.0
−0.05
0.05
0.15
0.25
(t−0.5)2
sample functions original summary private summary
Figure 8: Original and private RKHS smoothing mean with Gaussian Kernel (K1) for different initial mean functions µ
Table 1 to produce privacy enhanced estimates. The results are given in Figures 9, 10 and 11,
for exponential, Matern, and Gaussian, respectively. In each case, we see that the utility of
the privacy enhanced versions increases asφincreases, however, the largest values ofφproduce
estimates that are over smoothed. There is a good trade-off between privacy and utility with
φ= 0.01 and K3 or K1 (see figures 10 and 11), while K4 is a bit rougher and doesn’t perform
as well (e.g., Figure 9).
Turning to PCV, Table 2 shows the range we varied the (φ, ρ) and also the optimums which
give the minimum PCV. We use the optimal parameters to generate privacy enhanced estimates,
given in Figure 12. Here we see that the utility of the privacy enhanced estimates is excellent,
especially forK1. Using PCV tends to over smooth the original estimates (green lines), however,
Exp. Kernel Mat3/2 Kernel Gau. Kernel
φ optimum ρ optimum ρ optimum ρ
1 0.0001 0.25 0.25 0.01
2 0.001 0.20 0.20 0.01
3 0.01 0.30 0.30 0.03
4 0.03 0.80 0.80 0.05
Table 1: Optimum kernel range parameters ρ for different kernels with using CV for each fixed penalty parameter φin DTI dataset
0.0 0.2 0.4 0.6 0.8 1.0
Figure 9: Mean estimate for CCA and its private release with Exponential kernel (K4) using CV.
7
Conclusions
In this work we have provided a mechanism for achieving (α, β)-DP for a wide range of
sum-maries related to functional parameter estimates. This work expands dramatically upon the
work of Hall et al. (2013), who explored this topic in the context of point-wise releases of
func-tions. Our work covers theirs as a special case, but also includes full function releases and
nonlinear releases, that have not been considered before in statistical privacy literature. In
0.0 0.2 0.4 0.6 0.8 1.0
Figure 10: Mean estimate for CCA and its private release using Mat´ern(3/2) kernel (K3) with CV.
0.0 0.2 0.4 0.6 0.8 1.0
Kernel range φ range ρ optimum φ optimum ρ K4 [10−4,0.1] [0.2,1] 0.01 0.466
K3 [10−4,0.1] [0.05,0.5] 0.01 0.450
K1 [10−4,0.1] [0.01,0.1] 0.005 0.030
Table 2: Optimum penalty and range parameters (φ, ρ) for different kernels with PCV in CCA application.
0.0 0.2 0.4 0.6 0.8 1.0
Figure 12: Mean estimate of CCA and its private release for Exponential (K4), Mat´ern(3/2) (K3) and Gaussian kernels (K1) using PCV.
a study may collect and analyze many pieces of information such as genomic sequence data,
biomarker data, biomedical images, which all alone or if linked and/or combined with each other
or other demographic information, lead to greater disclosure risk albeit one that still requires
thorough understanding itself; for a most recent example of potential identification risks related
to functional data, see (Lippert et al., 2017)
The heart of our work utilizes densities for functional data in a way that has not yet been
explored in the literature. Previously, densities for functional data were thought not to exist
(Delaigle and Hall, 2010) and researchers instead relied on various approximations to densities.
Here we show that densities do in fact exist and we utilize them in our theoretical results.
the heart of this issue is theorthogonality of probability measures; in infinite dimensional spaces
dealing with the orthogonality of measures becomes a major issue, which has a direct impact
on the utility of density based methods for FDA.
The literature on privacy for scalar and multivariate data is quite extensive, while there has
been very little work done for FDA. Therefore, there are many opportunities for developing
additional theory and methods for functional data and estimates. One issue that we believe
will be especially important is the role of smoothing and regularization in preserving the utility
of privacy enhanced releases. As we have seen, a bit of extra smoothing can go a long way in
terms of maintaining privacy.
References
F. Ald`a and B. I. Rubinstein. The bernstein mechanism: Function release under differential privacy. In AAAI, pages 1705–1711, 2017.
G. Baxter. A strong limit theorem for gaussian processes. Proceedings of the American
Mathe-matical Society, 7(3):522–527, 1956.
A. Berlinet and C. Thomas-Agnan. Reproducing kernel Hilbert spaces in probability and statis-tics. Springer Science & Business Media, 2011.
F. Black and M. Scholes. The pricing of options and corporate liabilities. The journal of political
economy, pages 637–654, 1973.
E. Bongiorno and A. Goia. Classification methods for hilbert data based on surrogate density.
arXiv preprint arXiv:1506.03571, 2015.
R. Cameron and W. Martin. Transformations of wiener integrals under a general class of linear transformations. Transactions of the American Mathematical Society, 58(2):184–219, 1945.
R. H. Cameron and W. T. Martin. Transformations of weiner integrals under translations.
Annals of Mathematics, pages 386–396, 1944.
R. H. Cameron and W. T. Martin. The behavior of measure and measurability under change of scale in wiener space. Bulletin of the American Mathematical Society, 53(2):130–137, 1947.
K. Chaudhuri, C. Monteleoni, and D. Sarwate. Differentially private empirical risk minimization.
InJournal of Machine Learning Research, volume 12, pages 1069–1109, 2011.
T. Dalenius. Statistik Tidskrift, 15:429–444, 1977.
A. Delaigle and P. Hall. Defining probability density for a distribution of random functions.
C. Dwork. Differential privacy. In 33rd International Colloquium on Automata, Languages
and Programming, part II (ICALP 2006), volume 4052, pages 1–12, Venice, Italy, July 2006.
Springer Verlag. ISBN 3-540-35907-9. URLhttps://www.microsoft.com/en-us/research/
publication/differential-privacy/.
C. Dwork and A. Roth. The algorithmic foundations of differential privacy. Found. Trends
Theor. Comput. Sci., 9(3–4):211–407, Aug. 2014. ISSN 1551-305X. doi: 10.1561/
0400000042. URL http://dx.doi.org/10.1561/0400000042.
C. Dwork, K. Kenthapadi, F. McSherry, I. Mironov, and M. Naor. Our data, ourselves: Privacy via distributed noise generation. InProceedings of the 24th Annual International Conference
on The Theory and Applications of Cryptographic Techniques, EUROCRYPT’06, pages 486–
503, Berlin, Heidelberg, 2006a. Springer-Verlag. ISBN 3-540-34546-9, 978-3-540-34546-6. doi: 10.1007/11761679 29. URLhttp://dx.doi.org/10.1007/11761679_29.
C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating Noise to Sensitivity in Private
Data Analysis, pages 265–284. Springer Berlin Heidelberg, Berlin, Heidelberg, 2006b. ISBN
978-3-540-32732-5. doi: 10.1007/11681878 14. URLhttps://doi.org/10.1007/11681878_
14.
C. Dwork, A. Roth, et al. The algorithmic foundations of differential privacy. Foundations and
Trends® in Theoretical Computer Science, 9(3–4):211–407, 2014.
Y. Erlich and A. Narayanan. Routes for breaching and protecting genetic privacy. Nature
Reviews Genetics, 15(6):409–421, 2014.
J. Feldman. Equivalence and perpendicularity of gaussian processes. Pacific J. Math, 8(4): 699–708, 1958.
F. Ferraty and Y. Romain. The Oxford Handbook of Functional Data Analaysis. Oxford Uni-versity Press, 2011.
S. Fienberg and A. Slavkovi´c. Data Privacy and Confidentiality, pages 342–345. International Encyclopedia of Statistical Science. Springer-Verlag, 2010.
I. V. Girsanov. On transforming a certain class of stochastic processes by absolutely continuous substitution of measures. Theory of Probability & Its Applications, 5(3):285–301, 1960.
U. Grenander. Stochastic processes and statistical inference. Arkiv f¨or matematik, 1(3):195–277, 1950.
R. Hall, A. Rinaldo, and L. Wasserman. Differential privacy for functions and functional data.
The Journal of Machine Learning Research, 14(1):703–727, 2013.
L. Horv´ath and P. Kokoszka. Inference for functional data with applications, volume 200. Springer Science & Business Media, 2012.
L. Huang, F. Scheipl, J. Goldsmith, J. Gellar, J. Harezlak, M. W. McLean, B. Swihart, L. Xiao, C. Crainiceanu, and P. Reiss. refund: Regression with Functional Data, 2016. URLhttps:
//CRAN.R-project.org/package=refund. R package version 0.1-14.
I. A. Ibragimov and Y. A. Rozanov. Gaussian random processes, volume 9. Springer Science & Business Media, 1978.
S. Kakutani. On equivalence of infinite product measures. Annals of Mathematics, pages 214– 224, 1948.
D. Kifer, A. Smith, and A. Thakurta. Private convex empirical risk minimization and high-dimensional regression. In S. Mannor, N. Srebro, and R. C. Williamson, editors,Proceedings
of the 25th Annual Conference on Learning Theory, volume 23 of Proceedings of Machine
Learning Research, pages 25.1–25.40, Edinburgh, Scotland, 25–27 Jun 2012. PMLR. URL
http://proceedings.mlr.press/v23/kifer12.html.
P. Kokoszka and M. Reimherr. Introduction to functional data analysis. CRC Press, 2017.
J. Kulynych. Legal and ethical issues in neuroimaging research: human subjects protection, medical privacy, and the public communication of research results. Brain and cognition, 50 (3):345–357, 2002.
R. Laha and V. Rohatgi. Probability Theory. Wiley, New York, 1979.
J. Lane, V. Stodden, S. Bender, and H. Nissenbaum. Privacy, big data, and the public good:
Frameworks for engagement. Cambridge University Press, 2014.
C. Lippert, R. Sabatini, M. C. Maher, E. Y. Kang, S. Lee, O. Arikan, A. Harley, A. Bernal, P. Garst, V. Lavrenko, K. Yocum, T. Wong, M. Zhu, W.-Y. Yang, C. Chang, T. Lu, C. W. H. Lee, B. Hicks, S. Ramakrishnan, H. Tang, C. Xie, J. Piper, S. Brewerton, Y. Turpaz, A. Telenti, R. K. Roby, F. J. Och, and J. C. Venter. Identification of indi-viduals by trait prediction using whole-genome sequencing data. Proceedings of the
Na-tional Academy of Sciences, 114(38):10166–10171, 2017. doi: 10.1073/pnas.1711125114. URL
http://www.pnas.org/content/114/38/10166.abstract.
O. Nikodym. Sur une g´en´eralisation des int´egrales de mj radon. Fundamenta Mathematicae, 15 (1):131–179, 1930.
J. Radon. Theorie und anwendungen der absolut-additiven mengenfunktionen. Sitzungs-berichte der mathematisch-naturwissenschaftlichen Klasse der Kaiserlichen Akademie der
Wissenschaften, 122:1295–1438, 1913.
J. O. Ramsay and B. W. Silverman. Applied functional data analysis: methods and case studies, volume 77. Springer New York, 2002.
J. O. Ramsay and B. W. Silverman. Functional data analysis. Springer New York, 2005.
C. R. Rao and V. Varadarajan. Discrimination of gaussian processes. Sankhy¯a: The Indian
Journal of Statistics, Series A, pages 303–330, 1963.
Y. A. Rozanov. On the density of one gaussian measure with respect to another. Theory of
Probability & Its Applications, 7(1):82–87, 1962.
Y. A. Rozanov. On probability measures in functional spaces corresponding to stationary gaussian processes. Theory of Probability & Its Applications, 9(3):404–420, 1964.
E. E. Schadt, S. Woo, and K. Hao. Bayesian method to predict individual snp genotypes from gene expression data. Nature genetics, 44(5):603–608, 2012.
L. Shepp. The singularity of gaussian measures in function space. Proceedings of the National
Academy of Sciences, 52(2):430–433, 1964.
L. Shepp. Radon-nikodym derivatives of gaussian measures. The Annals of Mathematical
Statistics, pages 321–354, 1966.
A. Skorohod. On the densities of probability measures in functional spaces. Proc. Fifth Berkeley Sympos. Math. Statist. and Probability (Berkeley, Calif., 1965/66) Vol. II: Contributions to
Probability Theory, 1:163–182, 1967.
M. L. Stein. Interpolation of spatial data: some theory for kriging. Springer Science & Business Media, 2012.
D. Varberg. On equivalence of gaussian measures. Pacific Journal of Mathematics, 11(2): 751–762, 1961.
D. E. Varberg. On gaussian measures equivalent to wiener measure. Transactions of the
Amer-ican Mathematical Society, 113(2):262–273, 1964.
D. E. Varberg. On gaussian measures equivalent to wiener measure ii. Mathematica Scandinav-ica, 18(2):143–160, 1967.
L. Wasserman and S. Zhou. A statistical framework for differential privacy. Journal of the
American Statistical Association, 105(489):375–389, 2010. doi: 10.1198/jasa.2009.tm08651.
URLhttp://dx.doi.org/10.1198/jasa.2009.tm08651.
L. Willenborg and T. De Waal. Statistical disclosure control in practice. Number 111. Springer Science & Business Media, 1996.
A
Proofs
Lemma 1. If the privacy mechanism, f˜D, achieves (α, β)-DP, then so does any measurable
transformation of it.
Proof. Let T : Ω→ Ω′ be a measurable transformation between two metric spaces (say with
respect to the Borel σ-algebra). Then we have
P(T( ˜fD)∈A) =P( ˜fD ∈T−1(A))≤eαP( ˜fD′ ∈T−1(A)) +β=eαP(T( ˜fD′)∈A) +β,
Proof of Theorem 1. The result will follow if we can show that
matrix can be expressed using the covariance operatorC, namely,
Kij =E[hZ, hiiHhZ, hjiH] =hC(hi), hjiH.
has by direct verification that
We can then swap from the inner product overHto one overK:
hfD−fD′, P1(fD−fD′)iH =hfD−fD′, C(P1(fD−fD′))iK=hfD−fD′, P(fD−fD′)iK.
If we show that P is a projection operator, i.e., symmetric and idempotent, we will have the
desired bound:
hfD−fD′, P(fD−fD′)iK≤ kfD−fD′k2K.
First, P is idempotentbecause
P2 =
N
X
i=1
C(hi) N
X
j=1
(K−1)ij
*
hj, N
X
k=1
C(hk) N
X
l=1
(K−1)klhhl, .iH
+
H
=
N
X
i=1
C(hi) N
X
j=1
(K−1)ij N
X
k=1
hC(hk), hjiH
N
X
l=1
(K−1)klhhl, .iH
=
N
X
i=1
C(hi) N
X
l=1
Second, P issymmetric with respect with the Kinner product since
Hence P is a projection operator from HtoK, and the claim of the theorem holds.
Proof of Theorem 2. We aim to prove that dPD
dQ(h) = exp
the same probability measure. We will accomplish this by showing that both have the same
moment generating functional. Since Z is a Gaussian noise with mean zero and covariance
operatorC, ˜fD is a Gaussian process with meanfD and covariance operator δ2C. The moment
generating functional of ˜fD is then given by
On the other hand, the moment generating functional of P⋆ is
So both have the same moment generating functional and are thus the same measure, which
proves the claim.
Proof of Theorem 3. We aim to show that for any measurable subsetA⊂H we have
PD(A)≤eαPD′(A) +β.
Recall the global sensitivity for the functional case is
∆2 = sup
We equivalently aim to show that
We can express
α. Then trivially we have that
PD(A) =PD(A∩K1) +PD(A∩K2).
Using the definition of K1 we have that
PD(A∩K1) =
The proof will be complete if we can show that
PD(A∩K2)≤β.
This is equivalent to showing that
where X∼N(0, δ2C). This can equivalently be stated as
−21δ2(−kfD−fD′k2K−2hfD −fD′, X−fDiK)≥α
⇔hfD−fD′, XiK ≥δ2
α− 1
2δ2kfD′−fDk 2 K
⇔hC−1(fD−fD′), XiH ≥δ2
α− 1
2δ2kfD−fD′k 2 K
.
However hX, C−1(fD−fD′)iH is a normal random variable with mean zero and covariance
δ2hC(C−1(fD′−fD)), C−1(fD′−fD)iH=δ2kfD′−fDkK ≤δ2∆2.
So, if Z ∼N(0,1) then we have that
P
− 1
2δ2(−kfD −fD′k 2
K−2hfD−fD′, X−fDiK)≥α
≤P
δ∆Z ≥δ2
α− 1
2δ2kfD−fD′k 2 K
≤P
Z ≥ δ
∆
α− ∆
2
2δ2
=P Z ≥p2 log(2/β)− α 2p2 log(2/β)
!
≤P Z ≥p2 log(2/β)− 1 2p2 log(2/β)
!
≤β
as long as α≤1 (Hall et al., 2013).
Proof of Theorem 4. The upper bound for ∆2
n is derived as following:
∆2n= sup
B
Derivation of RKHS Estimate
Recall that
We can express gas
g(M) = 1
We then have that
g(M) = 1
Lemma 2. Let h and M be elements of a real separable Hilbert space H. Then we have that
1. ∂M∂ hh, MiH=h.
2. ∂M∂ hA(M), MiH= 2A(M),
where A is a symmetric bounded linear operator.
We then have that
∂g(M)
∂M =−2C( ¯X) + 2C(M) + 2φM.
Setting the above equal to zero we have that
C( ¯X) =C(ˆµ) +φM.
(7)
or
ˆ
µ= (C+φI)−1C( ¯X).
In addition, ˆµcan be written as ˆµ(t) =P∞
j=1hµ, vˆ jiHvj(t). So the properties (7) andhC(h1), h2iK =
hh1, h2iH imply that
hC( ¯X), vjiK=hC(ˆµ), vjiK+φhµ, vˆ jiK
hX, v¯ jiH =hµ, vˆ jiH+φhµ,ˆ
vj
λjiH
And eventually
hµ, vˆ jiH =
λj
λj+φh
¯
X(t), vj(t)iH
have
ˆ
µ(t) =
∞
X
j=1
hµˆ(t), vj(t)iHvj(t)
=
∞
X
j=1
λj
λj+φh
¯
X(t), vj(t)iHvj(t)
= 1
N
N
X
i=1
∞
X
j=1
λj
λj+φ