Differentially private high dimensional sparse covariance matrix estimation
Item Type Article
Authors Wang, Di;Xu, Jinhui
Citation Wang, D., & Xu, J. (2021). Differentially private high dimensional sparse covariance matrix estimation. Theoretical Computer Science. doi:10.1016/j.tcs.2021.03.001
Eprint version Post-print
DOI 10.1016/j.tcs.2021.03.001
Publisher Elsevier BV
Journal Theoretical Computer Science
Rights NOTICE: this is the authorโs version of a work that was accepted for publication in Theoretical Computer Science. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document.
Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Theoretical Computer Science, [, , (2021-03-10)] DOI:
10.1016/j.tcs.2021.03.001 . ยฉ 2021. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://
creativecommons.org/licenses/by-nc-nd/4.0/
Download date 2023-12-01 06:57:22
Link to Item http://hdl.handle.net/10754/668232
Di๏ฌerentially Private High Dimensional Sparse Covariance Matrix Estimation
โฉโฉDi Wang1,1,, Jinhui Xu1
aDivision of Computer, Electrical and Mathematical Sciences and Engineering King Abdullah University of Science and Technology
Thuwal 23955, Saudi Arabia
bDepartment of Computer Science and Engineering State University of New York at Bu๏ฌalo
338 Davis Hall, Bu๏ฌalo, 14260
Abstract
In this paper, we study the problem of estimating the covariance matrix under di๏ฌer- ential privacy, where the underlying covariance matrix is assumed to be sparse and of high dimensions. We propose a new method, called DP-Thresholding, to achieve a non- trivial๐2-norm based error bound whose dependence on the dimension drops to loga- rithmic instead of polynomial, it is significantly better than the existing ones, which add noise directly to the empirical covariance matrix. We also extend the๐2-norm based error bound to a general๐๐ค-norm based one for any1 โค๐คโค โ, and show that they share the same upper bound asymptotically. Our approach can be easily extended to local di๏ฌerential privacy. Experiments on the synthetic datasets show results that are consistent with theoretical claims.
Keywords: Di๏ฌerential privacy, sparse covariance estimation, high dimensional statistics
1. Introduction
In recent year, Machine Learning and Statistical Estimation have had profound im- pact on many applied domains such as social sciences, genomics, and medicine. A frequently encountered challenge in their applications is how to deal with the high di- mensionality of the datasets, especially for those in genomics, educational and psycho- logical research. A commonly adopted strategy for dealing with such an issue is to assume that the underlying structures of parameters are sparse.
Another often encountered challenge is how to handle sensitive data, such as those in social science, biomedicine, and genomics. A promising approach is to use di๏ฌer-
โฉA preliminary version appeared in Proceedings of The 53rd Annual Conference on Information Sciences and Systems (CISS 2019).
โฉโฉThis research was supported in part by the National Science Foundation (NSF) through grants CCF- 1422324 and CCF-1716400.
โCorresponding author
entially private mechanisms for the statistical inference and learning tasks. Di๏ฌeren- tial Privacy (DP) [? ] is a widely-accepted criterion that provides provable protection against identification and is resilient to arbitrary auxiliary information that might be available to attackers. Since its introduction over a decade ago, a rich line of works are now available, which have made di๏ฌerential privacy a compelling privacy enhancing technology for many organizations, such as Uber [? ], Google [?], Apple [?].
Estimating or studying the high dimensional datasets while keeping them (locally) di๏ฌerentially private could be quite challenging for many problems, such as sparse linear regression [?], sparse mean estimation [?], and selection problem [?]. However, there are also evidences showing that the loss of some problems under the privacy constraints can be quite small compared with their non-private counterparts. Examples of such nature include Empirical Risk Minimization under sparsity constraints [? ? ], high dimensional sparse PCA [? ? ? ], sparse inverse covariance estimation [? ], and high- dimensional distributions estimation [?]. Thus, it is desirable to determine which high dimensional problem can be learned or estimated e๏ฌciently in a private manner.
In this paper, we aim to give an answer to this question for a simple but fundamental problem in machine learning and statistics, namely estimating the underlying sparse co- variance matrix of a bounded sub-Gaussian distribution. For this problem, we propose a simple but nontrivial(๐,๐ฟ)-DP method, DP-Thresholding, and show that the squared ๐๐ค-norm error for any1โค๐คโคโis bounded by๐(๐ 2log๐ ๐+๐ 2log๐2๐๐log2 1๐ฟ+log๐22 1๐4๐ฟ), where ๐is the sample size,๐is the dimension of the underlying space and๐ is the sparsity of each row in the underlying covariance matrix. Moreover, our method can be easily ex- tended to the local di๏ฌerential privacy model with an upper bound of๐(๐ 2log๐๐๐2log1๐ฟ). Experiments on synthetic datasets confirm the theoretical claims. To our best knowl- edge, this is the first paper studying the problem of estimating a high dimensional sparse covariance matrix under (local) di๏ฌerential privacy.
2. Related Work
Recently, there have beeen several papers studying private distribution estimation, such as [? ? ? ? ? ]. For distribution estimation under the central di๏ฌerential pri- vacy model, [? ] considers the 1-dimensional private mean estimation of a Gaussian distribution with (un)known variance. The work that is probably most closely related to ours is [? ], which studies the problem of privately learning multivariate Gaussian and product distributions. The following are the main di๏ฌerences with ours. Firstly, our goal is to estimate the covariance of a sub-Gaussian distribution. Even though the class of distributions considered in our paper is larger than the one in [?], it has an additional assumption which requires the๐2norm of a sample of the distribution to be bounded by1. This means that it does not include the general Gaussian distribution. Secondly, although [?] also considers the high dimensional case, it does not assume the sparsity of the underlying covariance matrix. Thus, its error bound depends on the dimension- ality๐polynomially, which is large in the high dimensional case (๐ โซ ๐), while the dependence in our paper is only logarithmic (i.e.,log๐). Thirdly, the error in [? ] is measured by the total variation distance, while it is by๐๐ค-norm in our paper. Thus, the two results are not comparable. Fourthly, it seems di๏ฌcult to extend the methods of [?
] to the local model. Recently, [? ] also studies the covariance matrix estimation via iterative eigenvector sampling. However, their method is just for the low dimensional case and the error is measured with respect to the Frobenious norm.
Distribution estimation under local di๏ฌerential privacy has been studied in [? ? ].
However, both of them study only the 1-dimensional Gaussian distribution. Thus, it is quite di๏ฌerent from the class of distributions in our paper.
In this paper, we mainly use Gaussian mechanism on the covariance matrix, which has been studied in [? ? ? ]. However, as it will be shown later, simply outputting the perturbed covariance can incur big error and thus is insu๏ฌcient for our problem.
Compared to these previous work, the problem in this paper is clearly more complicated since here we assume it is in the high dimensional space where๐ โซ ๐.
3. Preliminaries 3.1. Di๏ฌerential Privacy
Di๏ฌerential privacy [? ] is by now a de facto standard for statistical data privacy which constitutes a strong standard for privacy guarantees for algorithms on aggregate databases. DP requires that there is no significant change in the outcome distribution under a single entry change to the dataset. We say that two datasets๐ท,๐ท!are neighbors if they di๏ฌer by only one entry, denoted as๐ทโผ๐ท!.
Definition 1(Di๏ฌerential Privacy [?]).A randomized algorithm๎ญis(๐,๐ฟ)-di๏ฌerentially private (DP) if for all neighboring datasets๐ท,๐ท!and for all measurebale events๐in the output space of๎ญ, the following holds
โ(๎ญ(๐ท)โ๐)โค๐๐โ(๎ญ(๐ท!)โ๐) +๐ฟ.
When๐ฟ= 0,๎ญis๐-di๏ฌerentially private.
We will use the Gaussian Mechanism [? ] to guarantee(๐,๐ฟ)-DP.
Definition 2(Gaussian Mechanism [? ] ). Given any function๐ โถ ๎๐ โ โ๐, the Gaussian Mechanism is defined as:
๎น๐บ(๐ท,๐,๐) =๐(๐ท) +๐,
where Y is drawn from Gaussian Distribution๎บ(0,๐2๐ผ๐)with๐ โฅ โ2 log(1.25โ๐ฟ)ฮ2(๐)
๐ .
Hereฮ2(๐)is the๐2-sensitivity of the function๐, i.e.
ฮ2(๐) = sup
๐ทโผ๐ท"!!๐(๐ท)โ๐(๐ท!)!!2.
The Gaussian Mechanism preservers(๐,๐ฟ)-di๏ฌerential privacy.
3.2. Private Sparse Covariance Estimation
Let๐ฅ1,๐ฅ2,โฏ,๐ฅ๐be๐random samples from a๐-variate distribution with covariance matrixฮฃ= (๐๐๐)1โค๐,๐โค๐, where the dimensionality๐is assumed to be high,i.e.,๐ โซ ๐โฅ Poly(log๐).
We define the parameter space of๐ -sparse covariance matrices as the following:
๎ณ0(๐ ) = {ฮฃ= (๐๐๐)1โค๐,๐โค๐โถ๐โ๐,๐is๐ -sparseโ๐โ[๐]}, (1) where๐โ๐,๐ denotes the๐-th column ofฮฃwith the entry๐๐๐removed. That is, a matrix in๎ณ0(๐ )has at most๐ non-zero o๏ฌ-diagonal elements in each column.
We assume that each๐ฅ๐ is sampled from a0-mean and sub-Gaussian distribution with parameter๐2, that is,
๐ผ[๐ฅ๐] = 0,โ{!๐ฃ๐๐ฅ๐!>๐ก}โค๐โ๐ก
2
2๐2,โ๐ก>0andโ๐ฃโ2= 1. (2)
This means that all the one-dimensional marginals of๐ฅ๐ have sub-Gaussian tails. We also assume that with probability 1,โ๐ฅ๐โ2โค1. We note that such assumptions are quite common in the di๏ฌerential privacy literature, such as [? ].
Let๎ผ๐(๐2,๐ )denote the set of distributions of๐ฅ๐satisfying all the above conditions (ฤฑ.e.,(??) andโ๐ฅ๐โ2โค1) and with the covariance matrixฮฃ โ๎ณ0(๐ ). The goal of private covariance estimation is to obtain an estimatorฮฃprivof the underlying covariance matrix ฮฃbased on{๐ฅ1,โฏ,๐ฅ๐}โผ๐ โ๎ผ๐(๐2,๐ )while preserving its privacy. In this paper, we will focus on(๐,๐ฟ)-di๏ฌerential privacy. We use the๐2norm to measure the di๏ฌerence betweenฮฃprivandฮฃ,i.e.,โฮฃprivโ ฮฃโ2.
Lemma 1([? ]). Let{๐ฅ1,โฏ,๐ฅ๐}be ๐random variables sampled from a Gaussian distribution๎บ(0,๐2). Then
๐ผmax
1โค๐โค๐!๐ฅ๐!โค๐โ
2 log 2๐, (3)
โ{ max
1โค๐โค๐!๐ฅ๐!โฅ๐ก}โค2๐๐โ2๐๐ก22. (4) Particularly, if๐= 1, we haveโ{!๐ฅ๐!โฅ๐ก}โค2๐โ2๐๐ก22.
Lemma 2([? ]). If{๐ฅ1,๐ฅ2,โฏ,๐ฅ๐}are sampled from a sub-Gaussian distribution in (??) andฮฃโ= (๐โ)1โค๐,๐โค๐= 1๐โ๐
๐=1๐ฅ๐๐ฅ๐๐ is the empirical covariance matrix, then there exist constants๐ถ1and๐พ >0such thatโ๐,๐โ[๐]
โ(!๐๐๐โ โ๐๐๐!>๐ก)โค๐ถ1๐โ๐๐ก2 8๐พ2 (5) for all!๐ก!โค๐, where๐ถ1,๐and๐พare constants and depend only on๐2. Specifically,
โ{!๐๐๐โ โ๐๐๐!>๐พ
โlog๐
๐ }โค๐ถ1๐โ8. (6)
Notations:. All the constants and big-๐notations throughout the paper omit the fac- tors that are related to polynomial of๐2, which is the sub-Gaussian parameter. Many previous papers assume the sub-Gaussian parameter as a constant, such as [? ?].
4. Method
4.1. A First Approach
A direct way to obtain a private estimator is to perturb the empirical covariance ma- trix by symmetric Gaussian matrices, which has been used in previous work on private PCA, such as [? ? ]. However, as we can see bellow, this method will introduce big error.
By [? ], for any given0 < ๐,๐ฟ โค 1 and{๐ฅ1,๐ฅ2,โฏ,๐ฅ๐} โผ ๐ โ ๎ผ๐(๐2,๐ ), the following perturbation procedure is(๐,๐ฟ)-di๏ฌerentially private:
ฮฃฬ =ฮฃโ+๐= (ฬ๐๐๐)1โค๐,๐โค๐= 1 ๐
โ๐
๐=1๐ฅ๐๐ฅ๐๐ +๐, (7)
where๐is a symmetric matrix with its upper triangle ( including the diagonal) being i.i.d. samples from๎บ(0,๐12); here๐12= 2 log(1.25โ๐ฟ)
๐2๐2 , and each lower triangle entry being copied from its upper triangle counterpart. By the Corollary 2.3.6 of [?], we know that
โ๐โ2 โค๐(โ
๐๐1) =๐(โ๐
โlog1๐ฟ
๐๐ )with high probability. We can easily get that, with high probability (i.e.,with probability at least1โ๐1๐ for some๐>0)
โฮฃ โ ฮฃโฬ 2โคโฮฃโโ ฮฃโ2+โ๐โ2โค๐(
โ ๐log1
๐ฟ
๐๐ ), (8)
where the second inequality is due to a Theorem in Chapter 1.6.3 of [?]. However, we can see that the upper bound of the error in (??) is quite large in the high dimensional case.
Another issue of the private estimator in (??) is that it is not clear whether it is positive-semidefinite, a property that is normally expected from an estimator.
4.2. Post-processing via Thresholding
We note that one of the reasons that the private estimatorฮฃฬ in (??) fails is due to the fact that some entries are quite large which makeโฮฃฬ๐๐โ ฮฃ๐๐โ2large for some๐,๐. More precisely, by (??) and (??) we can get the following, with probability at least1โ๐ถ๐โ6, for all1โค๐,๐โค๐,
!ฬ๐๐๐โ๐๐๐!โค๐พ
โlog๐ ๐ + 4โ
2 log1.๐ฟ25โ log๐
๐๐ =๐(๐พ
โlog๐
๐๐2 ). (9) Thus, to reduce the error, a natural approach is the following. For those๐๐๐with larger values, we keep the corresponding ฬ๐๐๐in order to make their di๏ฌerence less than some
threshold. For those๐๐๐with smaller values compared with (??), since the correspond- ingฬ๐๐๐may still be large, if we thresholdฬ๐๐๐to 0, we can lower the error onฬ๐๐๐โ๐๐๐.
Following the above thinking and the thresholding methods in [? ] and [? ], we propose the following DP-Thresholding method, which post-processes the perturbed co- variance matrix in (??) with the threshold๐พโ
log๐
๐ +4โ2 log(1.25โ๐ฟ)โ log๐
๐๐ . After thresh- olding, we further threshold the eigenvalues ofฮฃฬ in order to make it positive semi- definite. See Algorithm??for detail.
Algorithm 1DP-Thresholding
๐๐ง๐ฉ๐ฎ๐ญ:{๐ฅ1,๐ฅ2,โฏ,๐ฅ๐}โผ๐ โ๎ผ๐(๐2,๐ ), and๐,๐ฟโ(0,1)
1: Compute
ฮฃฬ = (ฬ๐๐๐)1โค๐,๐โค๐= 1 ๐
โ๐
๐=1๐ฅ๐๐ฅ๐๐ +๐,
where ๐ is a symmetric matrix with its upper triangle (including the diagonal) being i.i.d samples from๎บ(0,๐12); here๐12= 2 log(1.25โ๐ฟ)
๐2๐2 , and each lower triangle entry being copied from its upper triangle counterpart.
2: Define the thresholding estimatorฮฃฬ = (ฬ๐๐๐)1โค๐,๐โค๐as
ฬ๐๐๐ = ฬ๐๐๐โ ๐ผ[!ฬ๐๐๐!>๐พ
โlog๐ ๐ + 4โ
2 log(1.25โ๐ฟ)โ log๐
๐๐ ]. (10)
3: Let the eigen-decomposition ofฮฃฬ beฮฃฬ = โ๐
๐=1๐๐๐ฃ๐๐ฃ๐๐ . Let๐+ = max{๐๐,0}be the positive part of๐๐, then defineฮฃ+=โ๐
๐=1๐+๐ฃ๐๐ฃ๐๐ .
4: return ฮฃ+.
Theorem 1. For any0<๐,๐ฟโค1, Algorithm??is(๐,๐ฟ)-di๏ฌerentially private.
Proof. By Section 3 in [? ], we know that Step 1 keeps the matrix(๐,๐ฟ)-di๏ฌerentially private. Thus, Algorithm 1 is(๐,๐ฟ)-di๏ฌerentially private due to the post-processing property of di๏ฌerential privacy [? ].
For the matrixฮฃฬ in (??) after the first step of thresholding, we have the following key lemma.
Lemma 3. For every fixed1 โค๐,๐ โค ๐, there exists a constant๐ถ >0such that with probability at least1โ๐ถ๐โ92, the following holds:
!ฬ๐๐๐โ๐๐๐!โค4 min{!๐๐๐!,๐พ
โlog๐ ๐ +4โ
2 log(1.25โ๐ฟ)โ log๐
๐๐ }. (11)
Proof of Lemma??. Letฮฃโ= (๐๐๐โ)1โค๐,๐โค๐and๐ = (๐๐๐)1โค๐,๐โค๐. Define the event๐ด๐๐= {!ฬ๐๐๐!>๐พโlog
๐
๐ + 4โ2 log(1.25โ๐ฟ)โ log๐
๐๐ }. We have:
!ฬ๐๐๐โ๐๐๐!=!๐๐๐!โ ๐ผ(๐ด๐๐๐) +!ฬ๐๐๐โ๐๐๐!โ ๐ผ(๐ด๐๐). (12)
By the triangle inequality, it is easy to see that ๐ด๐๐=(
!ฬ๐๐๐โ๐๐๐+๐๐๐!>๐พ
โlog๐ ๐ +4โ
2 log(1.25โ๐ฟ)โ log๐ ๐๐
)
โ(
!ฬ๐๐๐โ๐๐๐!>๐พ
โlog๐ ๐ +4โ
2 log(1.25โ๐ฟ)โ log๐
๐๐ โ!๐๐๐!) and
๐ด๐๐๐=(
!ฬ๐๐๐โ๐๐๐+๐๐๐!โค๐พ
โlog๐ ๐ + 4โ
2 log(1.25โ๐ฟ)โ log๐ ๐๐
)
โ(
!ฬ๐๐๐โ๐๐๐!>!๐๐๐!โ(๐พ
โlog๐ ๐ +4โ
2 log(1.25โ๐ฟ)โ log๐
๐๐ ))
.
Depending on the value of๐๐๐, we have the following three cases.
Case 1. !๐๐๐!โค ๐พ4โlog
๐
๐ + โ2 log(1.25โ๐ฟ)โ log๐
๐๐ . For this case, we have โ(๐ด๐๐)โคโ(!ฬ๐๐๐โ๐๐๐!> 3๐พ
4
โlog๐ ๐ +3โ
2 log(1.25โ๐ฟ)โ log๐
๐๐ )โค๐ถ1๐โ92+2๐โ92. (13) This is due to the following:
โ*
!ฬ๐๐๐โ๐๐๐!> 3๐พ 4
โlog๐ ๐ +3โ
2 log(1.25โ๐ฟ)โ log๐ ๐๐
+ (14)
โคโ*
!๐๐๐โ โ๐๐๐!> 3๐พ 4
โlog๐ ๐ + 3โ
2 log(1.25โ๐ฟ)โ log๐
๐๐ โ!๐๐๐!+
(15)
=โ*
๐ต๐๐โ (3โ
2 log(1.25โ๐ฟ)โ log๐
๐๐ โ!๐๐๐!>0)+
(16) +โ*
๐ต๐๐โ (3โ
2 log(1.25โ๐ฟ)โ log๐
๐๐ โ!๐๐๐!โค0)+
(17)
โคโ(!๐๐๐โ โ๐๐๐!> 3๐พ 4
โlog๐
๐ ) +โ(3โ
2 log(1.25โ๐ฟ)โ log๐
๐๐ โค!๐๐๐!) (18)
โค๐ถ1๐โ92 + 2๐โ92, (19) where event๐ต๐๐denotes๐ต๐๐ = {!๐๐๐โโ๐๐๐!> 3๐พ4โlog
๐
๐ +3โ2 log(1.๐๐25โ๐ฟ)โlog๐)โ!๐๐๐!}, and the last inequality is due to (??) and (??).
Thus by (??), with probability at least1โ๐ถ1๐โ92 โ2๐โ92, we have
!ฬ๐๐๐โ๐๐๐!=!๐๐๐!, which satisfies (??).
Case 2. !๐๐๐!โฅ2๐พโlog
๐
๐ +8โ2 log(1.25โ๐ฟ)โ log๐
๐๐ . For this case, we have โ(๐ด๐๐๐)โคโ(!ฬ๐๐๐โ๐๐๐!โฅ๐พ
โlog๐ ๐ +4โ
2 log(1.25โ๐ฟ)โ log๐
๐๐ )โค๐ถ1๐โ92 + 2๐โ8, where the proof is the same as (13-17). Thus, with probability at least1โ๐ถ1๐โ92โ2๐โ8, we have
!ฬ๐๐๐โ๐๐๐!=!ฬ๐๐๐โ๐๐๐!. (20) Also, by (??), (??) also holds.
Case 3. Otherwise, ๐พ
4
โlog๐ ๐ +
โ2 log(1.25โ๐ฟ)โ log๐
๐๐ โค!๐๐๐!โค2๐พ
โlog๐ ๐ +8โ
2 log(1.25โ๐ฟ)โ log๐
๐๐ .
For this case, we have
!ฬ๐๐๐โ๐๐๐!=!๐๐๐!or!ฬ๐๐๐โ๐๐๐!. (21) When!๐๐๐!โค๐พโlog
๐
๐ +4โ2 log(1.๐๐25โ๐ฟ)โlog๐, we can see from (??) that with probability at least1โ2๐โ6โ๐ถ1๐โ8,
!ฬ๐๐๐โ๐๐๐!โค๐พ
โlog๐ ๐ +4โ
2 log(1.25โ๐ฟ)โ log๐
๐๐ โค4!๐๐๐!.
Thus,(??)also holds.
Otherwise when !๐๐๐! โฅ ๐พโlog
๐
๐ + 4โ2 log(1.25โ๐ฟ)โ log๐
๐๐ , (??)also holds. Thus, Lemma 3 is true.
By Lemma??, we have the following upper bound on the๐2-norm error ofฮฃ+. Theorem 2. The outputฮฃ+of Algorithm??satisfies:
๐ผโฮฃ+โ ฮฃโ22=๐(๐ 2log๐
๐ +๐ 2log๐log1๐ฟ
๐2๐2 +log1๐ฟ
๐2๐4), (22) where the expectation is taken over the coins of the Algorithm and the randomness of {๐ฅ1,๐ฅ2,โฏ,๐ฅ๐}.
Proof of Theorem??. We first show thatโฮฃ+โ ฮฃโ2 โค2โฮฃ โ ฮฃโฬ 2. This is due to the following
โฮฃ+โ ฮฃโ2โคโฮฃ+โฮฃโฬ 2+โฮฃ โ ฮฃโฬ 2โค max
๐โถ๐๐โค0!๐๐!+โฮฃ โ ฮฃโฬ 2
โค max
๐โถ๐๐โค0!๐๐โ๐๐(ฮฃ)!+โฮฃ โ ฮฃโฬ 2โค2โฮฃ โ ฮฃโฬ 2,
where the third inequality is due to the fact thatฮฃis positive semi-definite.
This means that we only need to boundโฮฃ โ ฮฃโฬ 2. Sinceฮฃ โ ฮฃฬ is symmetric, we know thatโฮฃ โ ฮฃโฬ 2โคโฮฃ โ ฮฃโฬ 1[? ]. Thus, it su๏ฌces to prove that the bound in (??) holds forโฮฃ โ ฮฃโฬ 1.
We define event๐ธ๐๐as
๐ธ๐๐ = {!ฬ๐๐๐โ๐๐๐!โค4 min{!๐๐๐!,๐พ
โlog๐ ๐ +4โ
2 log(1.25โ๐ฟ)โ log๐
๐๐ }}. (23)
Then, by Lemma??, we haveโ(๐ธ๐๐)โฅ1โ2๐ถ1๐โ92.
Let๐ท= (๐๐๐)1โค๐,๐โค๐, where๐๐๐ = (ฬ๐๐๐โ๐๐๐)โ ๐ผ(๐ธ๐๐๐). Then, we have
โฮฃ โ ฮฃโฬ 21โคโฮฃ โ ฮฃ โฬ ๐ท+๐ทโ21
โค2โฮฃ โ ฮฃ โฬ ๐ทโ21+ 2โ๐ทโ21
โค4(sup
๐
โ
๐โ ๐!ฬ๐๐๐โ๐๐๐!๐ผ(๐ธ๐๐))2+ 2โ๐ทโ21+๐(๐พ2log๐
๐ + log๐log1๐ฟ
๐2๐2 ). (24) We first bound the first term of (??). By the definition of๐ธ๐๐ and Lemma 3, we can upper bound it by
(sup๐
โ
๐โ ๐!ฬ๐๐๐โ๐๐๐!๐ผ(๐ธ๐๐))2
โค16(sup
๐
โ
๐โ ๐min{!๐๐๐!,๐พ
โlog๐ ๐ +4โ
2 log(1.25โ๐ฟ)โ log๐
๐๐ })2
โค16๐ 2(๐พ
โlog๐ ๐ + 4โ
2 log(1.25โ๐ฟ)โ log๐
๐๐ )2
โค๐(๐ 2๐พ2log๐
๐ +๐ 2log๐log1๐ฟ
๐2๐2 ), (25)
where the second inequality is due to the assumption that at most๐ elements of(๐๐๐)๐โ ๐ are non-zero.
For the second term in (??), we have ๐ผโ๐ทโ21โค๐โ
๐๐๐ผ๐๐๐2 =๐๐ผโ
๐๐
[(ฬ๐๐๐โ๐๐๐)2๐ผ(๐ธ๐๐๐ โ
{ฬ๐๐๐= ฬ๐๐๐}) + (ฬ๐๐๐โ๐๐๐)2๐ผ(๐ธ๐๐๐โ
{ฬ๐๐๐= 0})]
=๐๐ผโ
๐๐
(ฬ๐๐๐โ๐๐๐)2๐ผ(๐ธ๐๐๐) +๐โ
๐๐
๐ผ๐2๐๐๐ผ(๐ธ๐๐๐ โ
{ฬ๐๐๐= 0}). (26) For the first term in (??), we have
๐โ
๐๐
๐ผ{(ฬ๐๐๐โ๐๐๐)2๐ผ(๐ธ๐๐๐)}โค๐โ
๐๐
[๐ผ(ฬ๐๐๐โ๐๐๐)6]13โ23(๐ธ๐๐๐) (27)
โค๐ถ๐โ ๐2log1๐ฟ
๐2๐2๐โ3=๐(log1๐ฟ ๐2๐2),
where the first inequality is due to Hรถlder inequality and the second inequality is due to the fact that with some constant๐ถ3>0,
๐ผ(ฬ๐๐๐โ๐๐๐)6โค๐ถ3[๐ผ(๐๐๐โโ๐๐๐)6+๐ผ๐6๐๐].
Since๐๐๐ is a Gaussian distribution, we have ๐ผ๐6๐๐ โค ๐ถ4๐16 = ๐((log๐2๐21๐ฟ)3)for some constant๐ถ4[?]. For the first term๐ผ(๐โ๐๐โ๐๐๐)6, since๐ฅ๐is sampled from a sub-Gaussian distribution (??), by Whittle Inequality (Theorem 2 in [?] or [?]), the quadratic form ๐โ๐๐satisfies๐ผ(๐๐๐โ โ๐๐๐)6โค๐ถ5๐16 for some positive constant๐ถ5>0.
For the second term of (??), we have ๐โ
๐๐
๐ผ๐2๐๐๐ผ(๐ธ๐๐๐ โ
{ฬ๐๐๐= 0})
=๐โ
๐๐
๐ผ๐๐๐2๐ผ(!๐๐๐!>4๐พ
โlog๐ ๐ +16โ
2 log(1.25โ๐ฟ)โ log๐
๐๐ )
ร๐ผ(!ฬ๐๐๐!โค๐พ
โlog๐ ๐ +4โ
2 log(1.25โ๐ฟ)โ log๐
๐๐ )
โค๐โ
๐๐
๐ผ๐๐๐2๐ผ(!๐๐๐!>4๐พ
โlog๐ ๐ +16โ
2 log(1.25โ๐ฟ)โ log๐
๐๐ )
ร๐ผ(!๐๐๐!โ!ฬ๐๐๐โ๐๐๐!โค๐พ
โlog๐ ๐ +4โ
2 log(1.25โ๐ฟ)โ log๐
๐๐ )
โค๐โ
๐๐
๐๐๐2๐ผ๐ผ(!๐๐๐!>4๐พ
โlog๐ ๐ +16โ
2 log(1.25โ๐ฟ)โ log๐
๐๐ )๐ผ(!ฬ๐๐๐โ๐๐๐!โฅ 3 4!๐๐๐!)
โค๐โ
๐๐
๐๐๐2๐ผ๐ผ(!๐๐๐!>4๐พ
โlog๐ ๐ +16โ
2 log(1.25โ๐ฟ)โ log๐
๐๐ )๐ผ(!๐๐๐โ โ๐๐๐!+!๐๐๐!โฅ 3 4!๐๐๐!)
โค๐โ
๐๐
๐๐๐2โ*(
!๐โ๐๐โ๐๐๐!โฅ 3
4!๐๐๐!โ!๐๐๐!) โ (
!๐๐๐!>4๐พ
โlog๐ ๐ +16โ
2 log(1.25โ๐ฟ)โ log๐ ๐๐
)+
(28)
=๐โ
๐๐
๐๐๐2โ*(
!๐โ๐๐โ๐๐๐!โฅ 3
4!๐๐๐!โ!๐๐๐!) โ (
!๐๐๐!โค 1
4!๐๐๐!) โ (!๐๐๐!>4๐พ
โlog๐ ๐ + 16โ
2 log(1.25โ๐ฟ)โ log๐ ๐๐
)++๐โ
๐๐
๐2๐๐โ*(
!๐๐๐โ โ๐๐๐!โฅ 3
4!๐๐๐!โ!๐๐๐!)
โ (!๐๐๐!โฅ 1
4!๐๐๐!) โ (
!๐๐๐!>4๐พ
โlog๐ ๐ +16โ
2 log(1.25โ๐ฟ)โ log๐ ๐๐
)+ (29)
โค๐โ
๐๐
๐๐๐2โ*(
!๐โ๐๐โ๐๐๐!โฅ 1
2!๐๐๐!) โ (
!๐๐๐!>4๐พ
โlog๐ ๐ + 16โ
2 log(1.25โ๐ฟ)โ log๐ ๐๐
)+
+๐โ
๐๐
๐๐๐2โ*(
!๐๐๐!โฅ 1
4!๐๐๐!) โ (
!๐๐๐!>4๐พ
โlog๐ ๐ +16โ
2 log(1.25โ๐ฟ)โ log๐ ๐๐
)+.
(30) For the second term of (??), by Lemmas 1 and 2 we have
๐โ
๐๐
๐๐๐2โ({!๐๐๐!โฅ 1
4!๐๐๐!}โ
{!๐๐๐!>4๐พ
โlog๐ ๐ +16โ
2 log(1.25โ๐ฟ)โ log๐
๐๐ })
โค๐โ
๐๐
๐2๐๐โ(!๐๐๐!โฅ๐พ
โlog๐ ๐ +4โ
2 log(1.25โ๐ฟ) log๐
๐๐ })โ(!๐๐๐!> 1 4๐๐๐)
โค๐ถ๐โ
๐๐
๐๐๐2exp(โ(๐พโ
log๐
๐ + 4๐1โ log๐)2
2๐12 ) exp(โ ๐๐๐2 32๐12)
โค๐ถ๐โ
๐๐
๐๐๐2exp(โ(๐พโlog
๐
๐ + 4๐1โ log๐)2 2๐12 )32๐12
๐๐๐2
โค๐ถ๐12๐โ ๐2exp(โ๐พ2log๐
2๐๐21 )๐โ8 (31)
โค๐ถ๐12๐โ5( 2๐๐12
๐พ2log๐)2=๐(log21โ๐ฟ
๐2๐4 ). (32)
For the first term of (??), by Lemma 2 we have
๐โ
๐๐
๐2๐๐โ({!๐๐๐โ โ๐๐๐!โฅ 1
2!๐๐๐!}โ
{!๐๐๐!โฅ4๐พ
โlog๐ ๐ })
โค ๐ ๐
โ
๐๐
๐๐๐๐2exp(โ๐2๐๐๐2
๐พ2 )๐ผ(!๐๐๐!โฅ4๐พ
โlog๐ ๐ )
= ๐ ๐
โ
๐๐
[๐๐๐๐2exp(โ๐๐๐๐2
๐พ2)] exp(โ๐๐2๐๐
๐พ2)๐ผ(!๐๐๐!โฅ4๐พ
โlog๐ ๐ )
โค ๐ ๐
โ
๐๐
๐๐๐๐2 ๐พ2
๐๐๐๐2 exp(โ16 log๐) (33)
โค๐ถ๐พ2๐3
๐ ๐โ16=๐(1
๐). (34)
Thus in total, we have๐ผโ๐ทโ21=๐(log 1โ๐ฟ๐2๐2 ). This means that๐ผโฮฃ โ ฮฃโฬ 21=๐(๐ 2log๐ ๐ +
๐ 2log๐log1๐ฟ
๐2๐2 + log๐22 1๐4๐ฟ), which completes the proof.
Corollary 1. For any1โค๐คโคโ, the matrixฮฃฬin (??) after the first step of thresholding satisfies
โฮฃ โ ฮฃโฬ 2๐คโค๐(๐ 2log๐
๐ + ๐ 2log๐log1๐ฟ
๐2๐2 + log2 1๐ฟ
๐2๐4 ), (35) where the๐ค-norm of any matrix๐ดis defined asโ๐ดโ๐ค = supโ๐ด๐ฅโโ๐ฅโ๐ค
๐ค . Specifically, for a matrix๐ด= (๐๐๐)1โค๐,๐โค๐,โ๐ดโ1= sup๐โ
๐!๐๐๐!is the maximum absolute column sum, andโ๐ดโโ= sup๐โ
๐!๐๐๐!is the maximum absolute row sum.
Comparing the bound in the above corollary with the optimal minimax rateฮ(๐ 2log๐ ๐) in [? ] for the non-private case, we can see that the impact of the di๏ฌerential privacy is an additional error of๐(๐ 2log๐2๐๐2log1๐ฟ + log๐22 1๐4๐ฟ). It is an open problem to determine whether the bound in Theorem??is tight.
Proof of Corollary??. By Riesz-Thorin interpolation theorem [? ], we have
โ๐ดโ๐คโคmax{โ๐ดโ1,โ๐ดโ2,โ๐ดโโ}
for any matrix๐ดand any1 โค ๐คโค โ.Sinceฮฃ+โ ฮฃis a symmetric matrix, we have
โฮฃ+โ ฮฃโ2โคโฮฃ+โ ฮฃโ1andโฮฃ+โ ฮฃโ1=โฮฃ+โ ฮฃโโ. Thus, by the proof of Theorem
??we get this corollary.
4.3. Extension to Local Di๏ฌerential Privacy
One advantage of our Algorithm??is that it can be easily extended to the local di๏ฌerential privacy (LDP) model.
Di๏ฌerential privacy in the local model. In LDP, we have a data universe๎ฐ,๐players, with each holding a private data record๐ฅ๐ โ ๎ฐ, and a server that is in charge of co- ordinating the protocol. An LDP protocol proceeds in๐ rounds. In each round, the server sends a message, which sometimes is called a query, to a subset of the players, requesting them to run a particular algorithm. Based on the queries, each player๐in the subset selects an algorithm๐๐, runs it on her data, and sends the output back to the server.
Definition 3. [? ] An algorithm๐is(๐,๐ฟ)-locally di๏ฌerentially private (LDP) if for all pairs๐ฅ,๐ฅ!โ๎ฐ, and for all events๐ธin the output space of๐, we have
โ[๐(๐ฅ)โ๐ธ]โค๐๐โ[๐(๐ฅ!)โ๐ธ] +๐ฟ.
A multi-player protocol is๐-LDP if for all possible inputs and runs of the protocol, the transcript of player iโs interaction with the server is๐-LDP. If๐ = 1, we say that the protocol is(๐,๐ฟ)non-interactive LDP.
Algorithm 2LDP-Thresholding
๐๐ง๐ฉ๐ฎ๐ญ:{๐ฅ1,๐ฅ2,โฏ,๐ฅ๐}โผ๐ โ๎ผ๐(๐2,๐ ), and๐,๐ฟโ(0,1)
1: forEach๐โ[๐]do
2: Denote ฬ๐ฅ๐ฬ๐ฅ๐๐ = ๐ฅ๐๐ฅ๐๐ +๐ง๐, where ๐ง๐ โ โ๐ร๐ is a symmetric matrix with its upper triangle ( including the diagonal) being i.i.d samples from๎บ(0,๐2); here ๐2 = 2 log(1.25โ๐ฟ)
๐2 , and each lower triangle entry being copied from its upper triangle counterpart.
3: end for
4: Computeฮฃฬ = (ฬ๐๐๐)1โค๐,๐โค๐= 1๐โ๐
๐=1ฬ๐ฅ๐ฬ๐ฅ๐๐,
5: Define the thresholding estimatorฮฃฬ = (ฬ๐๐๐)1โค๐,๐โค๐as
ฬ๐๐๐ = ฬ๐๐๐โ ๐ผ[!ฬ๐๐๐!>๐พ
โlog๐ ๐ + 4โ
2 log(1.25โ๐ฟ)โ log๐
โ๐๐ ]. (36)
6: Let the eigen-decomposition ofฮฃฬ beฮฃฬ = โ๐
๐=1๐๐๐ฃ๐๐ฃ๐๐ . Let๐+ = max{๐๐,0}be the positive part of๐๐, then defineฮฃ+=โ๐
๐=1๐+๐ฃ๐๐ฃ๐๐ .
7: return ฮฃ+.
Inspired by Algorithm??, it is easy to extend our DP algorithm to the LDP model.
The idea is that each๐๐ perturbs its covariance and aggregates the noisy version of covariance; see Algorithm??for detail.
The following theorem shows that the error bound of the output of Algorithm??
is the same as the the bound in Theorem??asymptotically, whose proof is almost the same as in Theorem??.
Theorem 3. The outputฮฃ+of Algorithm??satisfies:
๐ผโฮฃ โ ฮฃโฬ 22=๐(๐ 2log๐log1๐ฟ
๐๐2 ), (37)
where the expectation is taken over the coins of the Algorithm and the randomness of {๐ฅ1,๐ฅ2,โฏ,๐ฅ๐}. Moreover,ฮฃฬ in (??) satisfiesโฮฃ โ ฮฃโฬ 2๐ค=๐(๐ log๐๐๐2log1๐ฟ).
Compared with the upper bound of๐(๐ 2log๐ ๐ + ๐ 2log๐2๐๐log2 1๐ฟ + log๐22 1๐4๐ฟ)in the central (๐,๐ฟ)-DP model, we can see that the upper bound of๐(๐ log๐๐๐log2 1๐ฟ)in the local model is much more lower. We also note that the upper bound in the local model is tight, given by [?] recently.
5. Experiments
In this section, we evaluate the performance of Algorithm??and??in practice on synthetic datasets.
Data Generation. We first generate a symmetric sparse matrix ฬ๐with the sparsity ratio ๐ ๐, that is, there are๐ ๐ร๐ร๐non-zero entries of the matrix. Then, we let๐ = ฬ๐+๐๐ผ๐ for some constant๐to make๐ positive semi-definite and then scale it to๐ = ๐๐ by some constant๐which makes the norm of samples less than 1 (with high probability)1. Finally, we sample{๐ฅ1,โฏ,๐ฅ๐}from the multivariate Gaussian distribution๎บ(0,๐). In this paper, we set๐= 50and๐= 200.
Experimental Settings. To measure the performance, we compare the๐1and๐2norm of relative error, respectively. That is, โฮฃโ๐โ+โ๐โ2 2 or โฮฃโ๐โ+โ๐โ1 1 with the sample size๐in three di๏ฌerent settings: 1) We set๐ = 100,๐ = 1,๐ฟ = 1
๐and change the sparse ratio ๐ ๐= {0.1,0.2,0.3,0.5}. 2) We set๐= 1,๐ฟ= 1๐,๐ ๐= 0.2, and let the dimensionality๐ vary in{50,100,200,500}. 3) We fix๐= 200,๐ฟ= 1๐,๐ ๐= 0.2and change the privacy level as๐ = {0.1,0.5,1,2}. We run each experiment 20 times and take the average error as the final one.
Experimental Results. Figure??and??are the results of DP-Thresholding (Algorithm
??) with๐2 and๐1 relative error, respectively. Figure ??and?? are the results of LDP-Thresholding (Algorithm??) with๐2and๐1relative error, respectively. From the figures we can see that: 1) if the sparsity ratio is largei.e.,the underlying covariance matrix is more dense, the relative error will be larger, this is due to the fact that the error depends on the sparsity s, as shown in Theorem??and??. 2) The dimensionality only slightly a๏ฌects the relative error. That is, even if we double the value of๐, the error increases only slightly. This is consistent with our theoretical analysis in Theorem
??and??which says that the error of our private estimators is only logarithmically depending on๐ (i.e., log๐). 3) As the privacy parameter๐ increases (which means
1Although the distribution is not bounded by 1, actually, as we see from the previous section, we can obtain the same result as long as the๐2norm of the samples is bounded by 1.