• Tidak ada hasil yang ditemukan

Differentially private high dimensional sparse covariance matrix estimation

N/A
N/A
Protected

Academic year: 2024

Membagikan "Differentially private high dimensional sparse covariance matrix estimation"

Copied!
18
0
0

Teks penuh

(1)

Differentially private high dimensional sparse covariance matrix estimation

Item Type Article

Authors Wang, Di;Xu, Jinhui

Citation Wang, D., & Xu, J. (2021). Differentially private high dimensional sparse covariance matrix estimation. Theoretical Computer Science. doi:10.1016/j.tcs.2021.03.001

Eprint version Post-print

DOI 10.1016/j.tcs.2021.03.001

Publisher Elsevier BV

Journal Theoretical Computer Science

Rights NOTICE: this is the authorโ€™s version of a work that was accepted for publication in Theoretical Computer Science. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document.

Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Theoretical Computer Science, [, , (2021-03-10)] DOI:

10.1016/j.tcs.2021.03.001 . ยฉ 2021. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://

creativecommons.org/licenses/by-nc-nd/4.0/

Download date 2023-12-01 06:57:22

Link to Item http://hdl.handle.net/10754/668232

(2)

Di๏ฌ€erentially Private High Dimensional Sparse Covariance Matrix Estimation

โœฉโœฉ

Di Wang1,1,, Jinhui Xu1

aDivision of Computer, Electrical and Mathematical Sciences and Engineering King Abdullah University of Science and Technology

Thuwal 23955, Saudi Arabia

bDepartment of Computer Science and Engineering State University of New York at Bu๏ฌ€alo

338 Davis Hall, Bu๏ฌ€alo, 14260

Abstract

In this paper, we study the problem of estimating the covariance matrix under di๏ฌ€er- ential privacy, where the underlying covariance matrix is assumed to be sparse and of high dimensions. We propose a new method, called DP-Thresholding, to achieve a non- trivial๐“2-norm based error bound whose dependence on the dimension drops to loga- rithmic instead of polynomial, it is significantly better than the existing ones, which add noise directly to the empirical covariance matrix. We also extend the๐“2-norm based error bound to a general๐“๐‘ค-norm based one for any1 โ‰ค๐‘คโ‰ค โˆž, and show that they share the same upper bound asymptotically. Our approach can be easily extended to local di๏ฌ€erential privacy. Experiments on the synthetic datasets show results that are consistent with theoretical claims.

Keywords: Di๏ฌ€erential privacy, sparse covariance estimation, high dimensional statistics

1. Introduction

In recent year, Machine Learning and Statistical Estimation have had profound im- pact on many applied domains such as social sciences, genomics, and medicine. A frequently encountered challenge in their applications is how to deal with the high di- mensionality of the datasets, especially for those in genomics, educational and psycho- logical research. A commonly adopted strategy for dealing with such an issue is to assume that the underlying structures of parameters are sparse.

Another often encountered challenge is how to handle sensitive data, such as those in social science, biomedicine, and genomics. A promising approach is to use di๏ฌ€er-

โœฉA preliminary version appeared in Proceedings of The 53rd Annual Conference on Information Sciences and Systems (CISS 2019).

โœฉโœฉThis research was supported in part by the National Science Foundation (NSF) through grants CCF- 1422324 and CCF-1716400.

โˆ—Corresponding author

(3)

entially private mechanisms for the statistical inference and learning tasks. Di๏ฌ€eren- tial Privacy (DP) [? ] is a widely-accepted criterion that provides provable protection against identification and is resilient to arbitrary auxiliary information that might be available to attackers. Since its introduction over a decade ago, a rich line of works are now available, which have made di๏ฌ€erential privacy a compelling privacy enhancing technology for many organizations, such as Uber [? ], Google [?], Apple [?].

Estimating or studying the high dimensional datasets while keeping them (locally) di๏ฌ€erentially private could be quite challenging for many problems, such as sparse linear regression [?], sparse mean estimation [?], and selection problem [?]. However, there are also evidences showing that the loss of some problems under the privacy constraints can be quite small compared with their non-private counterparts. Examples of such nature include Empirical Risk Minimization under sparsity constraints [? ? ], high dimensional sparse PCA [? ? ? ], sparse inverse covariance estimation [? ], and high- dimensional distributions estimation [?]. Thus, it is desirable to determine which high dimensional problem can be learned or estimated e๏ฌƒciently in a private manner.

In this paper, we aim to give an answer to this question for a simple but fundamental problem in machine learning and statistics, namely estimating the underlying sparse co- variance matrix of a bounded sub-Gaussian distribution. For this problem, we propose a simple but nontrivial(๐œ–,๐›ฟ)-DP method, DP-Thresholding, and show that the squared ๐“๐‘ค-norm error for any1โ‰ค๐‘คโ‰คโˆžis bounded by๐‘‚(๐‘ 2log๐‘› ๐‘+๐‘ 2log๐‘›2๐‘๐œ–log2 1๐›ฟ+log๐‘›22 1๐œ–4๐›ฟ), where ๐‘›is the sample size,๐‘is the dimension of the underlying space and๐‘ is the sparsity of each row in the underlying covariance matrix. Moreover, our method can be easily ex- tended to the local di๏ฌ€erential privacy model with an upper bound of๐‘‚(๐‘ 2log๐‘›๐œ–๐‘2log1๐›ฟ). Experiments on synthetic datasets confirm the theoretical claims. To our best knowl- edge, this is the first paper studying the problem of estimating a high dimensional sparse covariance matrix under (local) di๏ฌ€erential privacy.

2. Related Work

Recently, there have beeen several papers studying private distribution estimation, such as [? ? ? ? ? ]. For distribution estimation under the central di๏ฌ€erential pri- vacy model, [? ] considers the 1-dimensional private mean estimation of a Gaussian distribution with (un)known variance. The work that is probably most closely related to ours is [? ], which studies the problem of privately learning multivariate Gaussian and product distributions. The following are the main di๏ฌ€erences with ours. Firstly, our goal is to estimate the covariance of a sub-Gaussian distribution. Even though the class of distributions considered in our paper is larger than the one in [?], it has an additional assumption which requires the๐“2norm of a sample of the distribution to be bounded by1. This means that it does not include the general Gaussian distribution. Secondly, although [?] also considers the high dimensional case, it does not assume the sparsity of the underlying covariance matrix. Thus, its error bound depends on the dimension- ality๐‘polynomially, which is large in the high dimensional case (๐‘ โ‰ซ ๐‘›), while the dependence in our paper is only logarithmic (i.e.,log๐‘). Thirdly, the error in [? ] is measured by the total variation distance, while it is by๐“๐‘ค-norm in our paper. Thus, the two results are not comparable. Fourthly, it seems di๏ฌƒcult to extend the methods of [?

(4)

] to the local model. Recently, [? ] also studies the covariance matrix estimation via iterative eigenvector sampling. However, their method is just for the low dimensional case and the error is measured with respect to the Frobenious norm.

Distribution estimation under local di๏ฌ€erential privacy has been studied in [? ? ].

However, both of them study only the 1-dimensional Gaussian distribution. Thus, it is quite di๏ฌ€erent from the class of distributions in our paper.

In this paper, we mainly use Gaussian mechanism on the covariance matrix, which has been studied in [? ? ? ]. However, as it will be shown later, simply outputting the perturbed covariance can incur big error and thus is insu๏ฌƒcient for our problem.

Compared to these previous work, the problem in this paper is clearly more complicated since here we assume it is in the high dimensional space where๐‘ โ‰ซ ๐‘›.

3. Preliminaries 3.1. Di๏ฌ€erential Privacy

Di๏ฌ€erential privacy [? ] is by now a de facto standard for statistical data privacy which constitutes a strong standard for privacy guarantees for algorithms on aggregate databases. DP requires that there is no significant change in the outcome distribution under a single entry change to the dataset. We say that two datasets๐ท,๐ท!are neighbors if they di๏ฌ€er by only one entry, denoted as๐ทโˆผ๐ท!.

Definition 1(Di๏ฌ€erential Privacy [?]).A randomized algorithm๎ˆญis(๐œ–,๐›ฟ)-di๏ฌ€erentially private (DP) if for all neighboring datasets๐ท,๐ท!and for all measurebale events๐‘†in the output space of๎ˆญ, the following holds

โ„™(๎ˆญ(๐ท)โˆˆ๐‘†)โ‰ค๐‘’๐œ–โ„™(๎ˆญ(๐ท!)โˆˆ๐‘†) +๐›ฟ.

When๐›ฟ= 0,๎ˆญis๐œ–-di๏ฌ€erentially private.

We will use the Gaussian Mechanism [? ] to guarantee(๐œ–,๐›ฟ)-DP.

Definition 2(Gaussian Mechanism [? ] ). Given any function๐‘ž โˆถ ๎‰„๐‘› โ†’ โ„๐‘, the Gaussian Mechanism is defined as:

๎ˆน๐บ(๐ท,๐‘ž,๐œ–) =๐‘ž(๐ท) +๐‘Œ,

where Y is drawn from Gaussian Distribution๎ˆบ(0,๐œŽ2๐ผ๐‘)with๐œŽ โ‰ฅ โˆš2 log(1.25โˆ•๐›ฟ)ฮ”2(๐‘ž)

๐œ– .

Hereฮ”2(๐‘ž)is the๐“2-sensitivity of the function๐‘ž, i.e.

ฮ”2(๐‘ž) = sup

๐ทโˆผ๐ท"!!๐‘ž(๐ท)โˆ’๐‘ž(๐ท!)!!2.

The Gaussian Mechanism preservers(๐œ–,๐›ฟ)-di๏ฌ€erential privacy.

(5)

3.2. Private Sparse Covariance Estimation

Let๐‘ฅ1,๐‘ฅ2,โ‹ฏ,๐‘ฅ๐‘›be๐‘›random samples from a๐‘-variate distribution with covariance matrixฮฃ= (๐œŽ๐‘–๐‘—)1โ‰ค๐‘–,๐‘—โ‰ค๐‘, where the dimensionality๐‘is assumed to be high,i.e.,๐‘ โ‰ซ ๐‘›โ‰ฅ Poly(log๐‘).

We define the parameter space of๐‘ -sparse covariance matrices as the following:

๎ˆณ0(๐‘ ) = {ฮฃ= (๐œŽ๐‘–๐‘—)1โ‰ค๐‘–,๐‘—โ‰ค๐‘โˆถ๐œŽโˆ’๐‘—,๐‘—is๐‘ -sparseโˆ€๐‘—โˆˆ[๐‘]}, (1) where๐œŽโˆ’๐‘—,๐‘— denotes the๐‘—-th column ofฮฃwith the entry๐œŽ๐‘—๐‘—removed. That is, a matrix in๎ˆณ0(๐‘ )has at most๐‘ non-zero o๏ฌ€-diagonal elements in each column.

We assume that each๐‘ฅ๐‘– is sampled from a0-mean and sub-Gaussian distribution with parameter๐œŽ2, that is,

๐”ผ[๐‘ฅ๐‘–] = 0,โ„™{!๐‘ฃ๐‘‡๐‘ฅ๐‘–!>๐‘ก}โ‰ค๐‘’โˆ’๐‘ก

2

2๐œŽ2,โˆ€๐‘ก>0andโ€–๐‘ฃโ€–2= 1. (2)

This means that all the one-dimensional marginals of๐‘ฅ๐‘– have sub-Gaussian tails. We also assume that with probability 1,โ€–๐‘ฅ๐‘–โ€–2โ‰ค1. We note that such assumptions are quite common in the di๏ฌ€erential privacy literature, such as [? ].

Let๎ˆผ๐‘(๐œŽ2,๐‘ )denote the set of distributions of๐‘ฅ๐‘–satisfying all the above conditions (ฤฑ.e.,(??) andโ€–๐‘ฅ๐‘–โ€–2โ‰ค1) and with the covariance matrixฮฃ โˆˆ๎ˆณ0(๐‘ ). The goal of private covariance estimation is to obtain an estimatorฮฃprivof the underlying covariance matrix ฮฃbased on{๐‘ฅ1,โ‹ฏ,๐‘ฅ๐‘›}โˆผ๐‘ƒ โˆˆ๎ˆผ๐‘(๐œŽ2,๐‘ )while preserving its privacy. In this paper, we will focus on(๐œ–,๐›ฟ)-di๏ฌ€erential privacy. We use the๐“2norm to measure the di๏ฌ€erence betweenฮฃprivandฮฃ,i.e.,โ€–ฮฃprivโˆ’ ฮฃโ€–2.

Lemma 1([? ]). Let{๐‘ฅ1,โ‹ฏ,๐‘ฅ๐‘›}be ๐‘›random variables sampled from a Gaussian distribution๎ˆบ(0,๐œŽ2). Then

๐”ผmax

1โ‰ค๐‘–โ‰ค๐‘›!๐‘ฅ๐‘–!โ‰ค๐œŽโˆš

2 log 2๐‘›, (3)

โ„™{ max

1โ‰ค๐‘–โ‰ค๐‘›!๐‘ฅ๐‘–!โ‰ฅ๐‘ก}โ‰ค2๐‘›๐‘’โˆ’2๐œŽ๐‘ก22. (4) Particularly, if๐‘›= 1, we haveโ„™{!๐‘ฅ๐‘–!โ‰ฅ๐‘ก}โ‰ค2๐‘’โˆ’2๐œŽ๐‘ก22.

Lemma 2([? ]). If{๐‘ฅ1,๐‘ฅ2,โ‹ฏ,๐‘ฅ๐‘›}are sampled from a sub-Gaussian distribution in (??) andฮฃโˆ—= (๐œŽโˆ—)1โ‰ค๐‘–,๐‘—โ‰ค๐‘= 1๐‘›โˆ‘๐‘›

๐‘–=1๐‘ฅ๐‘–๐‘ฅ๐‘‡๐‘– is the empirical covariance matrix, then there exist constants๐ถ1and๐›พ >0such thatโˆ€๐‘–,๐‘—โˆˆ[๐‘]

โ„™(!๐œŽ๐‘–๐‘—โˆ— โˆ’๐œŽ๐‘–๐‘—!>๐‘ก)โ‰ค๐ถ1๐‘’โˆ’๐‘›๐‘ก2 8๐›พ2 (5) for all!๐‘ก!โ‰ค๐œ‰, where๐ถ1,๐œ‰and๐›พare constants and depend only on๐œŽ2. Specifically,

โ„™{!๐œŽ๐‘–๐‘—โˆ— โˆ’๐œŽ๐‘–๐‘—!>๐›พ

โˆšlog๐‘

๐‘› }โ‰ค๐ถ1๐‘โˆ’8. (6)

(6)

Notations:. All the constants and big-๐‘‚notations throughout the paper omit the fac- tors that are related to polynomial of๐œŽ2, which is the sub-Gaussian parameter. Many previous papers assume the sub-Gaussian parameter as a constant, such as [? ?].

4. Method

4.1. A First Approach

A direct way to obtain a private estimator is to perturb the empirical covariance ma- trix by symmetric Gaussian matrices, which has been used in previous work on private PCA, such as [? ? ]. However, as we can see bellow, this method will introduce big error.

By [? ], for any given0 < ๐œ–,๐›ฟ โ‰ค 1 and{๐‘ฅ1,๐‘ฅ2,โ‹ฏ,๐‘ฅ๐‘›} โˆผ ๐‘ƒ โˆˆ ๎ˆผ๐‘(๐œŽ2,๐‘ ), the following perturbation procedure is(๐œ–,๐›ฟ)-di๏ฌ€erentially private:

ฮฃฬƒ =ฮฃโˆ—+๐‘= (ฬƒ๐œŽ๐‘–๐‘—)1โ‰ค๐‘–,๐‘—โ‰ค๐‘= 1 ๐‘›

โˆ‘๐‘›

๐‘–=1๐‘ฅ๐‘–๐‘ฅ๐‘‡๐‘– +๐‘, (7)

where๐‘is a symmetric matrix with its upper triangle ( including the diagonal) being i.i.d. samples from๎ˆบ(0,๐œŽ12); here๐œŽ12= 2 log(1.25โˆ•๐›ฟ)

๐‘›2๐œ–2 , and each lower triangle entry being copied from its upper triangle counterpart. By the Corollary 2.3.6 of [?], we know that

โ€–๐‘โ€–2 โ‰ค๐‘‚(โˆš

๐‘๐œŽ1) =๐‘‚(โˆš๐‘

โˆšlog1๐›ฟ

๐‘›๐œ– )with high probability. We can easily get that, with high probability (i.e.,with probability at least1โˆ’๐‘1๐‘ for some๐‘>0)

โ€–ฮฃ โˆ’ ฮฃโ€–ฬƒ 2โ‰คโ€–ฮฃโˆ—โˆ’ ฮฃโ€–2+โ€–๐‘โ€–2โ‰ค๐‘‚(

โˆš ๐‘log1

๐›ฟ

๐‘›๐œ– ), (8)

where the second inequality is due to a Theorem in Chapter 1.6.3 of [?]. However, we can see that the upper bound of the error in (??) is quite large in the high dimensional case.

Another issue of the private estimator in (??) is that it is not clear whether it is positive-semidefinite, a property that is normally expected from an estimator.

4.2. Post-processing via Thresholding

We note that one of the reasons that the private estimatorฮฃฬƒ in (??) fails is due to the fact that some entries are quite large which makeโ€–ฮฃฬƒ๐‘–๐‘—โˆ’ ฮฃ๐‘–๐‘—โ€–2large for some๐‘–,๐‘—. More precisely, by (??) and (??) we can get the following, with probability at least1โˆ’๐ถ๐‘โˆ’6, for all1โ‰ค๐‘–,๐‘—โ‰ค๐‘,

!ฬƒ๐œŽ๐‘–๐‘—โˆ’๐œŽ๐‘–๐‘—!โ‰ค๐›พ

โˆšlog๐‘ ๐‘› + 4โˆš

2 log1.๐›ฟ25โˆš log๐‘

๐‘›๐œ– =๐‘‚(๐›พ

โˆšlog๐‘

๐‘›๐œ–2 ). (9) Thus, to reduce the error, a natural approach is the following. For those๐œŽ๐‘–๐‘—with larger values, we keep the corresponding ฬƒ๐œŽ๐‘–๐‘—in order to make their di๏ฌ€erence less than some

(7)

threshold. For those๐œŽ๐‘–๐‘—with smaller values compared with (??), since the correspond- ingฬƒ๐œŽ๐‘–๐‘—may still be large, if we thresholdฬƒ๐œŽ๐‘–๐‘—to 0, we can lower the error onฬƒ๐œŽ๐‘–๐‘—โˆ’๐œŽ๐‘–๐‘—.

Following the above thinking and the thresholding methods in [? ] and [? ], we propose the following DP-Thresholding method, which post-processes the perturbed co- variance matrix in (??) with the threshold๐›พโˆš

log๐‘

๐‘› +4โˆš2 log(1.25โˆ•๐›ฟ)โˆš log๐‘

๐‘›๐œ– . After thresh- olding, we further threshold the eigenvalues ofฮฃฬ‚ in order to make it positive semi- definite. See Algorithm??for detail.

Algorithm 1DP-Thresholding

๐ˆ๐ง๐ฉ๐ฎ๐ญ:{๐‘ฅ1,๐‘ฅ2,โ‹ฏ,๐‘ฅ๐‘›}โˆผ๐‘ƒ โˆˆ๎ˆผ๐‘(๐œŽ2,๐‘ ), and๐œ–,๐›ฟโˆˆ(0,1)

1: Compute

ฮฃฬƒ = (ฬƒ๐œŽ๐‘–๐‘—)1โ‰ค๐‘–,๐‘—โ‰ค๐‘= 1 ๐‘›

โˆ‘๐‘›

๐‘–=1๐‘ฅ๐‘–๐‘ฅ๐‘‡๐‘– +๐‘,

where ๐‘ is a symmetric matrix with its upper triangle (including the diagonal) being i.i.d samples from๎ˆบ(0,๐œŽ12); here๐œŽ12= 2 log(1.25โˆ•๐›ฟ)

๐‘›2๐œ–2 , and each lower triangle entry being copied from its upper triangle counterpart.

2: Define the thresholding estimatorฮฃฬ‚ = (ฬ‚๐œŽ๐‘–๐‘—)1โ‰ค๐‘–,๐‘—โ‰ค๐‘›as

ฬ‚๐œŽ๐‘–๐‘— = ฬƒ๐œŽ๐‘–๐‘—โ‹…๐ผ[!ฬƒ๐œŽ๐‘–๐‘—!>๐›พ

โˆšlog๐‘ ๐‘› + 4โˆš

2 log(1.25โˆ•๐›ฟ)โˆš log๐‘

๐‘›๐œ– ]. (10)

3: Let the eigen-decomposition ofฮฃฬ‚ beฮฃฬ‚ = โˆ‘๐‘

๐‘–=1๐œ†๐‘–๐‘ฃ๐‘–๐‘ฃ๐‘‡๐‘– . Let๐œ†+ = max{๐œ†๐‘–,0}be the positive part of๐œ†๐‘–, then defineฮฃ+=โˆ‘๐‘

๐‘–=1๐œ†+๐‘ฃ๐‘–๐‘ฃ๐‘‡๐‘– .

4: return ฮฃ+.

Theorem 1. For any0<๐œ–,๐›ฟโ‰ค1, Algorithm??is(๐œ–,๐›ฟ)-di๏ฌ€erentially private.

Proof. By Section 3 in [? ], we know that Step 1 keeps the matrix(๐œ–,๐›ฟ)-di๏ฌ€erentially private. Thus, Algorithm 1 is(๐œ–,๐›ฟ)-di๏ฌ€erentially private due to the post-processing property of di๏ฌ€erential privacy [? ].

For the matrixฮฃฬ‚ in (??) after the first step of thresholding, we have the following key lemma.

Lemma 3. For every fixed1 โ‰ค๐‘–,๐‘— โ‰ค ๐‘, there exists a constant๐ถ >0such that with probability at least1โˆ’๐ถ๐‘โˆ’92, the following holds:

!ฬ‚๐œŽ๐‘–๐‘—โˆ’๐œŽ๐‘–๐‘—!โ‰ค4 min{!๐œŽ๐‘–๐‘—!,๐›พ

โˆšlog๐‘ ๐‘› +4โˆš

2 log(1.25โˆ•๐›ฟ)โˆš log๐‘

๐‘›๐œ– }. (11)

Proof of Lemma??. Letฮฃโˆ—= (๐œŽ๐‘–๐‘—โˆ—)1โ‰ค๐‘–,๐‘—โ‰ค๐‘and๐‘ = (๐‘›๐‘–๐‘—)1โ‰ค๐‘–,๐‘—โ‰ค๐‘. Define the event๐ด๐‘–๐‘—= {!ฬƒ๐œŽ๐‘–๐‘—!>๐›พโˆšlog

๐‘

๐‘› + 4โˆš2 log(1.25โˆ•๐›ฟ)โˆš log๐‘

๐‘›๐œ– }. We have:

!ฬ‚๐œŽ๐‘–๐‘—โˆ’๐œŽ๐‘–๐‘—!=!๐œŽ๐‘–๐‘—!โ‹…๐ผ(๐ด๐‘๐‘–๐‘—) +!ฬƒ๐œŽ๐‘–๐‘—โˆ’๐œŽ๐‘–๐‘—!โ‹…๐ผ(๐ด๐‘–๐‘—). (12)

(8)

By the triangle inequality, it is easy to see that ๐ด๐‘–๐‘—=(

!ฬƒ๐œŽ๐‘–๐‘—โˆ’๐œŽ๐‘–๐‘—+๐œŽ๐‘–๐‘—!>๐›พ

โˆšlog๐‘ ๐‘› +4โˆš

2 log(1.25โˆ•๐›ฟ)โˆš log๐‘ ๐‘›๐œ–

)

โŠ‚(

!ฬƒ๐œŽ๐‘–๐‘—โˆ’๐œŽ๐‘–๐‘—!>๐›พ

โˆšlog๐‘ ๐‘› +4โˆš

2 log(1.25โˆ•๐›ฟ)โˆš log๐‘

๐‘›๐œ– โˆ’!๐œŽ๐‘–๐‘—!) and

๐ด๐‘๐‘–๐‘—=(

!ฬƒ๐œŽ๐‘–๐‘—โˆ’๐œŽ๐‘–๐‘—+๐œŽ๐‘–๐‘—!โ‰ค๐›พ

โˆšlog๐‘ ๐‘› + 4โˆš

2 log(1.25โˆ•๐›ฟ)โˆš log๐‘ ๐‘›๐œ–

)

โŠ‚(

!ฬƒ๐œŽ๐‘–๐‘—โˆ’๐œŽ๐‘–๐‘—!>!๐œŽ๐‘–๐‘—!โˆ’(๐›พ

โˆšlog๐‘ ๐‘› +4โˆš

2 log(1.25โˆ•๐›ฟ)โˆš log๐‘

๐‘›๐œ– ))

.

Depending on the value of๐œŽ๐‘–๐‘—, we have the following three cases.

Case 1. !๐œŽ๐‘–๐‘—!โ‰ค ๐›พ4โˆšlog

๐‘

๐‘› + โˆš2 log(1.25โˆ•๐›ฟ)โˆš log๐‘

๐‘›๐œ– . For this case, we have โ„™(๐ด๐‘–๐‘—)โ‰คโ„™(!ฬƒ๐œŽ๐‘–๐‘—โˆ’๐œŽ๐‘–๐‘—!> 3๐›พ

4

โˆšlog๐‘ ๐‘› +3โˆš

2 log(1.25โˆ•๐›ฟ)โˆš log๐‘

๐‘›๐œ– )โ‰ค๐ถ1๐‘โˆ’92+2๐‘โˆ’92. (13) This is due to the following:

โ„™*

!ฬƒ๐œŽ๐‘–๐‘—โˆ’๐œŽ๐‘–๐‘—!> 3๐›พ 4

โˆšlog๐‘ ๐‘› +3โˆš

2 log(1.25โˆ•๐›ฟ)โˆš log๐‘ ๐‘›๐œ–

+ (14)

โ‰คโ„™*

!๐œŽ๐‘–๐‘—โˆ— โˆ’๐œŽ๐‘–๐‘—!> 3๐›พ 4

โˆšlog๐‘ ๐‘› + 3โˆš

2 log(1.25โˆ•๐›ฟ)โˆš log๐‘

๐‘›๐œ– โˆ’!๐‘›๐‘–๐‘—!+

(15)

=โ„™*

๐ต๐‘–๐‘—โ‹‚ (3โˆš

2 log(1.25โˆ•๐›ฟ)โˆš log๐‘

๐‘›๐œ– โˆ’!๐‘›๐‘–๐‘—!>0)+

(16) +โ„™*

๐ต๐‘–๐‘—โ‹‚ (3โˆš

2 log(1.25โˆ•๐›ฟ)โˆš log๐‘

๐‘›๐œ– โˆ’!๐‘›๐‘–๐‘—!โ‰ค0)+

(17)

โ‰คโ„™(!๐œŽ๐‘–๐‘—โˆ— โˆ’๐œŽ๐‘–๐‘—!> 3๐›พ 4

โˆšlog๐‘

๐‘› ) +โ„™(3โˆš

2 log(1.25โˆ•๐›ฟ)โˆš log๐‘

๐‘›๐œ– โ‰ค!๐‘›๐‘–๐‘—!) (18)

โ‰ค๐ถ1๐‘โˆ’92 + 2๐‘โˆ’92, (19) where event๐ต๐‘–๐‘—denotes๐ต๐‘–๐‘— = {!๐œŽ๐‘–๐‘—โˆ—โˆ’๐œŽ๐‘–๐‘—!> 3๐›พ4โˆšlog

๐‘

๐‘› +3โˆš2 log(1.๐‘›๐œ–25โˆ•๐›ฟ)โˆšlog๐‘)โˆ’!๐‘›๐‘–๐‘—!}, and the last inequality is due to (??) and (??).

Thus by (??), with probability at least1โˆ’๐ถ1๐‘โˆ’92 โˆ’2๐‘โˆ’92, we have

!ฬ‚๐œŽ๐‘–๐‘—โˆ’๐œŽ๐‘–๐‘—!=!๐œŽ๐‘–๐‘—!, which satisfies (??).

(9)

Case 2. !๐œŽ๐‘–๐‘—!โ‰ฅ2๐›พโˆšlog

๐‘

๐‘› +8โˆš2 log(1.25โˆ•๐›ฟ)โˆš log๐‘

๐‘›๐œ– . For this case, we have โ„™(๐ด๐‘๐‘–๐‘—)โ‰คโ„™(!ฬƒ๐œŽ๐‘–๐‘—โˆ’๐œŽ๐‘–๐‘—!โ‰ฅ๐›พ

โˆšlog๐‘ ๐‘› +4โˆš

2 log(1.25โˆ•๐›ฟ)โˆš log๐‘

๐‘›๐œ– )โ‰ค๐ถ1๐‘โˆ’92 + 2๐‘โˆ’8, where the proof is the same as (13-17). Thus, with probability at least1โˆ’๐ถ1๐‘โˆ’92โˆ’2๐‘โˆ’8, we have

!ฬ‚๐œŽ๐‘–๐‘—โˆ’๐œŽ๐‘–๐‘—!=!ฬƒ๐œŽ๐‘–๐‘—โˆ’๐œŽ๐‘–๐‘—!. (20) Also, by (??), (??) also holds.

Case 3. Otherwise, ๐›พ

4

โˆšlog๐‘ ๐‘› +

โˆš2 log(1.25โˆ•๐›ฟ)โˆš log๐‘

๐‘›๐œ– โ‰ค!๐œŽ๐‘–๐‘—!โ‰ค2๐›พ

โˆšlog๐‘ ๐‘› +8โˆš

2 log(1.25โˆ•๐›ฟ)โˆš log๐‘

๐‘›๐œ– .

For this case, we have

!ฬ‚๐œŽ๐‘–๐‘—โˆ’๐œŽ๐‘–๐‘—!=!๐œŽ๐‘–๐‘—!or!ฬƒ๐œŽ๐‘–๐‘—โˆ’๐œŽ๐‘–๐‘—!. (21) When!๐œŽ๐‘–๐‘—!โ‰ค๐›พโˆšlog

๐‘

๐‘› +4โˆš2 log(1.๐‘›๐œ–25โˆ•๐›ฟ)โˆšlog๐‘, we can see from (??) that with probability at least1โˆ’2๐‘โˆ’6โˆ’๐ถ1๐‘โˆ’8,

!ฬƒ๐œŽ๐‘–๐‘—โˆ’๐œŽ๐‘–๐‘—!โ‰ค๐›พ

โˆšlog๐‘ ๐‘› +4โˆš

2 log(1.25โˆ•๐›ฟ)โˆš log๐‘

๐‘›๐œ– โ‰ค4!๐œŽ๐‘–๐‘—!.

Thus,(??)also holds.

Otherwise when !๐œŽ๐‘–๐‘—! โ‰ฅ ๐›พโˆšlog

๐‘

๐‘› + 4โˆš2 log(1.25โˆ•๐›ฟ)โˆš log๐‘

๐‘›๐œ– , (??)also holds. Thus, Lemma 3 is true.

By Lemma??, we have the following upper bound on the๐“2-norm error ofฮฃ+. Theorem 2. The outputฮฃ+of Algorithm??satisfies:

๐”ผโ€–ฮฃ+โˆ’ ฮฃโ€–22=๐‘‚(๐‘ 2log๐‘

๐‘› +๐‘ 2log๐‘log1๐›ฟ

๐‘›2๐œ–2 +log1๐›ฟ

๐‘›2๐œ–4), (22) where the expectation is taken over the coins of the Algorithm and the randomness of {๐‘ฅ1,๐‘ฅ2,โ‹ฏ,๐‘ฅ๐‘›}.

Proof of Theorem??. We first show thatโ€–ฮฃ+โˆ’ ฮฃโ€–2 โ‰ค2โ€–ฮฃ โˆ’ ฮฃโ€–ฬ‚ 2. This is due to the following

โ€–ฮฃ+โˆ’ ฮฃโ€–2โ‰คโ€–ฮฃ+โˆ’ฮฃโ€–ฬ‚ 2+โ€–ฮฃ โˆ’ ฮฃโ€–ฬ‚ 2โ‰ค max

๐‘–โˆถ๐œ†๐‘–โ‰ค0!๐œ†๐‘–!+โ€–ฮฃ โˆ’ ฮฃโ€–ฬ‚ 2

โ‰ค max

๐‘–โˆถ๐œ†๐‘–โ‰ค0!๐œ†๐‘–โˆ’๐œ†๐‘–(ฮฃ)!+โ€–ฮฃ โˆ’ ฮฃโ€–ฬ‚ 2โ‰ค2โ€–ฮฃ โˆ’ ฮฃโ€–ฬ‚ 2,

where the third inequality is due to the fact thatฮฃis positive semi-definite.

(10)

This means that we only need to boundโ€–ฮฃ โˆ’ ฮฃโ€–ฬ‚ 2. Sinceฮฃ โˆ’ ฮฃฬ‚ is symmetric, we know thatโ€–ฮฃ โˆ’ ฮฃโ€–ฬ‚ 2โ‰คโ€–ฮฃ โˆ’ ฮฃโ€–ฬ‚ 1[? ]. Thus, it su๏ฌƒces to prove that the bound in (??) holds forโ€–ฮฃ โˆ’ ฮฃโ€–ฬ‚ 1.

We define event๐ธ๐‘–๐‘—as

๐ธ๐‘–๐‘— = {!ฬ‚๐œŽ๐‘–๐‘—โˆ’๐œŽ๐‘–๐‘—!โ‰ค4 min{!๐œŽ๐‘–๐‘—!,๐›พ

โˆšlog๐‘ ๐‘› +4โˆš

2 log(1.25โˆ•๐›ฟ)โˆš log๐‘

๐‘›๐œ– }}. (23)

Then, by Lemma??, we haveโ„™(๐ธ๐‘–๐‘—)โ‰ฅ1โˆ’2๐ถ1๐‘โˆ’92.

Let๐ท= (๐‘‘๐‘–๐‘—)1โ‰ค๐‘–,๐‘—โ‰ค๐‘, where๐‘‘๐‘–๐‘— = (ฬ‚๐œŽ๐‘–๐‘—โˆ’๐œŽ๐‘–๐‘—)โ‹…๐ผ(๐ธ๐‘–๐‘—๐‘). Then, we have

โ€–ฮฃ โˆ’ ฮฃโ€–ฬ‚ 21โ‰คโ€–ฮฃ โˆ’ ฮฃ โˆ’ฬ‚ ๐ท+๐ทโ€–21

โ‰ค2โ€–ฮฃ โˆ’ ฮฃ โˆ’ฬ‚ ๐ทโ€–21+ 2โ€–๐ทโ€–21

โ‰ค4(sup

๐‘—

โˆ‘

๐‘–โ‰ ๐‘—!ฬ‚๐œŽ๐‘–๐‘—โˆ’๐œŽ๐‘–๐‘—!๐ผ(๐ธ๐‘–๐‘—))2+ 2โ€–๐ทโ€–21+๐‘‚(๐›พ2log๐‘

๐‘› + log๐‘log1๐›ฟ

๐‘›2๐œ–2 ). (24) We first bound the first term of (??). By the definition of๐ธ๐‘–๐‘— and Lemma 3, we can upper bound it by

(sup๐‘—

โˆ‘

๐‘–โ‰ ๐‘—!ฬ‚๐œŽ๐‘–๐‘—โˆ’๐œŽ๐‘–๐‘—!๐ผ(๐ธ๐‘–๐‘—))2

โ‰ค16(sup

๐‘—

โˆ‘

๐‘–โ‰ ๐‘—min{!๐œŽ๐‘–๐‘—!,๐›พ

โˆšlog๐‘ ๐‘› +4โˆš

2 log(1.25โˆ•๐›ฟ)โˆš log๐‘

๐‘›๐œ– })2

โ‰ค16๐‘ 2(๐›พ

โˆšlog๐‘ ๐‘› + 4โˆš

2 log(1.25โˆ•๐›ฟ)โˆš log๐‘

๐‘›๐œ– )2

โ‰ค๐‘‚(๐‘ 2๐›พ2log๐‘

๐‘› +๐‘ 2log๐‘log1๐›ฟ

๐‘›2๐œ–2 ), (25)

where the second inequality is due to the assumption that at most๐‘ elements of(๐œŽ๐‘–๐‘—)๐‘–โ‰ ๐‘— are non-zero.

For the second term in (??), we have ๐”ผโ€–๐ทโ€–21โ‰ค๐‘โˆ‘

๐‘–๐‘—๐”ผ๐‘‘๐‘–๐‘—2 =๐‘๐”ผโˆ‘

๐‘–๐‘—

[(ฬ‚๐œŽ๐‘–๐‘—โˆ’๐œŽ๐‘–๐‘—)2๐ผ(๐ธ๐‘–๐‘—๐‘ โ‹‚

{ฬ‚๐œŽ๐‘–๐‘—= ฬƒ๐œŽ๐‘–๐‘—}) + (ฬ‚๐œŽ๐‘–๐‘—โˆ’๐œŽ๐‘–๐‘—)2๐ผ(๐ธ๐‘–๐‘—๐‘โ‹‚

{ฬ‚๐œŽ๐‘–๐‘—= 0})]

=๐‘๐”ผโˆ‘

๐‘–๐‘—

(ฬƒ๐œŽ๐‘–๐‘—โˆ’๐œŽ๐‘–๐‘—)2๐ผ(๐ธ๐‘–๐‘—๐‘) +๐‘โˆ‘

๐‘–๐‘—

๐”ผ๐œŽ2๐‘–๐‘—๐ผ(๐ธ๐‘–๐‘—๐‘ โ‹‚

{ฬ‚๐œŽ๐‘–๐‘—= 0}). (26) For the first term in (??), we have

๐‘โˆ‘

๐‘–๐‘—

๐”ผ{(ฬƒ๐œŽ๐‘–๐‘—โˆ’๐œŽ๐‘–๐‘—)2๐ผ(๐ธ๐‘๐‘–๐‘—)}โ‰ค๐‘โˆ‘

๐‘–๐‘—

[๐”ผ(ฬƒ๐œŽ๐‘–๐‘—โˆ’๐œŽ๐‘–๐‘—)6]13โ„™23(๐ธ๐‘–๐‘—๐‘) (27)

โ‰ค๐ถ๐‘โ‹…๐‘2log1๐›ฟ

๐‘›2๐œ–2๐‘โˆ’3=๐‘‚(log1๐›ฟ ๐‘›2๐œ–2),

(11)

where the first inequality is due to Hรถlder inequality and the second inequality is due to the fact that with some constant๐ถ3>0,

๐”ผ(ฬƒ๐œŽ๐‘–๐‘—โˆ’๐œŽ๐‘–๐‘—)6โ‰ค๐ถ3[๐”ผ(๐œŽ๐‘–๐‘—โˆ—โˆ’๐œŽ๐‘–๐‘—)6+๐”ผ๐‘›6๐‘–๐‘—].

Since๐‘›๐‘–๐‘— is a Gaussian distribution, we have ๐”ผ๐‘›6๐‘–๐‘— โ‰ค ๐ถ4๐œŽ16 = ๐‘‚((log๐‘›2๐œ–21๐›ฟ)3)for some constant๐ถ4[?]. For the first term๐”ผ(๐œŽโˆ—๐‘–๐‘—โˆ’๐œŽ๐‘–๐‘—)6, since๐‘ฅ๐‘–is sampled from a sub-Gaussian distribution (??), by Whittle Inequality (Theorem 2 in [?] or [?]), the quadratic form ๐œŽโˆ—๐‘–๐‘—satisfies๐”ผ(๐œŽ๐‘–๐‘—โˆ— โˆ’๐œŽ๐‘–๐‘—)6โ‰ค๐ถ5๐‘›16 for some positive constant๐ถ5>0.

For the second term of (??), we have ๐‘โˆ‘

๐‘–๐‘—

๐”ผ๐œŽ2๐‘–๐‘—๐ผ(๐ธ๐‘–๐‘—๐‘ โ‹‚

{ฬ‚๐œŽ๐‘–๐‘—= 0})

=๐‘โˆ‘

๐‘–๐‘—

๐”ผ๐œŽ๐‘–๐‘—2๐ผ(!๐œŽ๐‘–๐‘—!>4๐›พ

โˆšlog๐‘ ๐‘› +16โˆš

2 log(1.25โˆ•๐›ฟ)โˆš log๐‘

๐‘›๐œ– )

ร—๐ผ(!ฬƒ๐œŽ๐‘–๐‘—!โ‰ค๐›พ

โˆšlog๐‘ ๐‘› +4โˆš

2 log(1.25โˆ•๐›ฟ)โˆš log๐‘

๐‘›๐œ– )

โ‰ค๐‘โˆ‘

๐‘–๐‘—

๐”ผ๐œŽ๐‘–๐‘—2๐ผ(!๐œŽ๐‘–๐‘—!>4๐›พ

โˆšlog๐‘ ๐‘› +16โˆš

2 log(1.25โˆ•๐›ฟ)โˆš log๐‘

๐‘›๐œ– )

ร—๐ผ(!๐œŽ๐‘–๐‘—!โˆ’!ฬƒ๐œŽ๐‘–๐‘—โˆ’๐œŽ๐‘–๐‘—!โ‰ค๐›พ

โˆšlog๐‘ ๐‘› +4โˆš

2 log(1.25โˆ•๐›ฟ)โˆš log๐‘

๐‘›๐œ– )

โ‰ค๐‘โˆ‘

๐‘–๐‘—

๐œŽ๐‘–๐‘—2๐”ผ๐ผ(!๐œŽ๐‘–๐‘—!>4๐›พ

โˆšlog๐‘ ๐‘› +16โˆš

2 log(1.25โˆ•๐›ฟ)โˆš log๐‘

๐‘›๐œ– )๐ผ(!ฬƒ๐œŽ๐‘–๐‘—โˆ’๐œŽ๐‘–๐‘—!โ‰ฅ 3 4!๐œŽ๐‘–๐‘—!)

โ‰ค๐‘โˆ‘

๐‘–๐‘—

๐œŽ๐‘–๐‘—2๐”ผ๐ผ(!๐œŽ๐‘–๐‘—!>4๐›พ

โˆšlog๐‘ ๐‘› +16โˆš

2 log(1.25โˆ•๐›ฟ)โˆš log๐‘

๐‘›๐œ– )๐ผ(!๐œŽ๐‘–๐‘—โˆ— โˆ’๐œŽ๐‘–๐‘—!+!๐‘›๐‘–๐‘—!โ‰ฅ 3 4!๐œŽ๐‘–๐‘—!)

โ‰ค๐‘โˆ‘

๐‘–๐‘—

๐œŽ๐‘–๐‘—2โ„™*(

!๐œŽโˆ—๐‘–๐‘—โˆ’๐œŽ๐‘–๐‘—!โ‰ฅ 3

4!๐œŽ๐‘–๐‘—!โˆ’!๐‘›๐‘–๐‘—!) โ‹‚ (

!๐œŽ๐‘–๐‘—!>4๐›พ

โˆšlog๐‘ ๐‘› +16โˆš

2 log(1.25โˆ•๐›ฟ)โˆš log๐‘ ๐‘›๐œ–

)+

(28)

(12)

=๐‘โˆ‘

๐‘–๐‘—

๐œŽ๐‘–๐‘—2โ„™*(

!๐œŽโˆ—๐‘–๐‘—โˆ’๐œŽ๐‘–๐‘—!โ‰ฅ 3

4!๐œŽ๐‘–๐‘—!โˆ’!๐‘›๐‘–๐‘—!) โ‹‚ (

!๐‘›๐‘–๐‘—!โ‰ค 1

4!๐œŽ๐‘–๐‘—!) โ‹‚ (!๐œŽ๐‘–๐‘—!>4๐›พ

โˆšlog๐‘ ๐‘› + 16โˆš

2 log(1.25โˆ•๐›ฟ)โˆš log๐‘ ๐‘›๐œ–

)++๐‘โˆ‘

๐‘–๐‘—

๐œŽ2๐‘–๐‘—โ„™*(

!๐œŽ๐‘–๐‘—โˆ— โˆ’๐œŽ๐‘–๐‘—!โ‰ฅ 3

4!๐œŽ๐‘–๐‘—!โˆ’!๐‘›๐‘–๐‘—!)

โ‹‚ (!๐‘›๐‘–๐‘—!โ‰ฅ 1

4!๐œŽ๐‘–๐‘—!) โ‹‚ (

!๐œŽ๐‘–๐‘—!>4๐›พ

โˆšlog๐‘ ๐‘› +16โˆš

2 log(1.25โˆ•๐›ฟ)โˆš log๐‘ ๐‘›๐œ–

)+ (29)

โ‰ค๐‘โˆ‘

๐‘–๐‘—

๐œŽ๐‘–๐‘—2โ„™*(

!๐œŽโˆ—๐‘–๐‘—โˆ’๐œŽ๐‘–๐‘—!โ‰ฅ 1

2!๐œŽ๐‘–๐‘—!) โ‹‚ (

!๐œŽ๐‘–๐‘—!>4๐›พ

โˆšlog๐‘ ๐‘› + 16โˆš

2 log(1.25โˆ•๐›ฟ)โˆš log๐‘ ๐‘›๐œ–

)+

+๐‘โˆ‘

๐‘–๐‘—

๐œŽ๐‘–๐‘—2โ„™*(

!๐‘›๐‘–๐‘—!โ‰ฅ 1

4!๐œŽ๐‘–๐‘—!) โ‹‚ (

!๐œŽ๐‘–๐‘—!>4๐›พ

โˆšlog๐‘ ๐‘› +16โˆš

2 log(1.25โˆ•๐›ฟ)โˆš log๐‘ ๐‘›๐œ–

)+.

(30) For the second term of (??), by Lemmas 1 and 2 we have

๐‘โˆ‘

๐‘–๐‘—

๐œŽ๐‘–๐‘—2โ„™({!๐‘›๐‘–๐‘—!โ‰ฅ 1

4!๐œŽ๐‘–๐‘—!}โ‹‚

{!๐œŽ๐‘–๐‘—!>4๐›พ

โˆšlog๐‘ ๐‘› +16โˆš

2 log(1.25โˆ•๐›ฟ)โˆš log๐‘

๐‘›๐œ– })

โ‰ค๐‘โˆ‘

๐‘–๐‘—

๐œŽ2๐‘–๐‘—โ„™(!๐‘›๐‘–๐‘—!โ‰ฅ๐›พ

โˆšlog๐‘ ๐‘› +4โˆš

2 log(1.25โˆ•๐›ฟ) log๐‘

๐‘›๐œ– })โ„™(!๐‘›๐‘–๐‘—!> 1 4๐œŽ๐‘–๐‘—)

โ‰ค๐ถ๐‘โˆ‘

๐‘–๐‘—

๐œŽ๐‘–๐‘—2exp(โˆ’(๐›พโˆš

log๐‘

๐‘› + 4๐œŽ1โˆš log๐‘)2

2๐œŽ12 ) exp(โˆ’ ๐œŽ๐‘–๐‘—2 32๐œŽ12)

โ‰ค๐ถ๐‘โˆ‘

๐‘–๐‘—

๐œŽ๐‘–๐‘—2exp(โˆ’(๐›พโˆšlog

๐‘

๐‘› + 4๐œŽ1โˆš log๐‘)2 2๐œŽ12 )32๐œŽ12

๐œŽ๐‘–๐‘—2

โ‰ค๐ถ๐œŽ12๐‘โ‹…๐‘2exp(โˆ’๐›พ2log๐‘

2๐‘›๐œŽ21 )๐‘โˆ’8 (31)

โ‰ค๐ถ๐œŽ12๐‘โˆ’5( 2๐‘›๐œŽ12

๐›พ2log๐‘)2=๐‘‚(log21โˆ•๐›ฟ

๐‘›2๐œ–4 ). (32)

(13)

For the first term of (??), by Lemma 2 we have

๐‘โˆ‘

๐‘–๐‘—

๐œŽ2๐‘–๐‘—โ„™({!๐œŽ๐‘–๐‘—โˆ— โˆ’๐œŽ๐‘–๐‘—!โ‰ฅ 1

2!๐œŽ๐‘–๐‘—!}โ‹‚

{!๐œŽ๐‘–๐‘—!โ‰ฅ4๐›พ

โˆšlog๐‘ ๐‘› })

โ‰ค ๐‘ ๐‘›

โˆ‘

๐‘–๐‘—

๐‘›๐œŽ๐‘–๐‘—2exp(โˆ’๐‘›2๐œŽ๐‘–๐‘—2

๐›พ2 )๐ผ(!๐œŽ๐‘–๐‘—!โ‰ฅ4๐›พ

โˆšlog๐‘ ๐‘› )

= ๐‘ ๐‘›

โˆ‘

๐‘–๐‘—

[๐‘›๐œŽ๐‘–๐‘—2exp(โˆ’๐‘›๐œŽ๐‘–๐‘—2

๐›พ2)] exp(โˆ’๐‘›๐œŽ2๐‘–๐‘—

๐›พ2)๐ผ(!๐œŽ๐‘–๐‘—!โ‰ฅ4๐›พ

โˆšlog๐‘ ๐‘› )

โ‰ค ๐‘ ๐‘›

โˆ‘

๐‘–๐‘—

๐‘›๐œŽ๐‘–๐‘—2 ๐›พ2

๐‘›๐œŽ๐‘–๐‘—2 exp(โˆ’16 log๐‘) (33)

โ‰ค๐ถ๐›พ2๐‘3

๐‘› ๐‘โˆ’16=๐‘‚(1

๐‘›). (34)

Thus in total, we have๐”ผโ€–๐ทโ€–21=๐‘‚(log 1โˆ•๐›ฟ๐‘›2๐œ–2 ). This means that๐”ผโ€–ฮฃ โˆ’ ฮฃโ€–ฬ‚ 21=๐‘‚(๐‘ 2log๐‘› ๐‘ +

๐‘ 2log๐‘log1๐›ฟ

๐‘›2๐œ–2 + log๐‘›22 1๐œ–4๐›ฟ), which completes the proof.

Corollary 1. For any1โ‰ค๐‘คโ‰คโˆž, the matrixฮฃฬ‚in (??) after the first step of thresholding satisfies

โ€–ฮฃ โˆ’ ฮฃโ€–ฬ‚ 2๐‘คโ‰ค๐‘‚(๐‘ 2log๐‘

๐‘› + ๐‘ 2log๐‘log1๐›ฟ

๐‘›2๐œ–2 + log2 1๐›ฟ

๐‘›2๐œ–4 ), (35) where the๐‘ค-norm of any matrix๐ดis defined asโ€–๐ดโ€–๐‘ค = supโ€–๐ด๐‘ฅโ€–โ€–๐‘ฅโ€–๐‘ค

๐‘ค . Specifically, for a matrix๐ด= (๐‘Ž๐‘–๐‘—)1โ‰ค๐‘–,๐‘—โ‰ค๐‘,โ€–๐ดโ€–1= sup๐‘—โˆ‘

๐‘–!๐‘Ž๐‘–๐‘—!is the maximum absolute column sum, andโ€–๐ดโ€–โˆž= sup๐‘–โˆ‘

๐‘—!๐‘Ž๐‘–๐‘—!is the maximum absolute row sum.

Comparing the bound in the above corollary with the optimal minimax rateฮ˜(๐‘ 2log๐‘› ๐‘) in [? ] for the non-private case, we can see that the impact of the di๏ฌ€erential privacy is an additional error of๐‘‚(๐‘ 2log๐‘›2๐‘๐œ–2log1๐›ฟ + log๐‘›22 1๐œ–4๐›ฟ). It is an open problem to determine whether the bound in Theorem??is tight.

Proof of Corollary??. By Riesz-Thorin interpolation theorem [? ], we have

โ€–๐ดโ€–๐‘คโ‰คmax{โ€–๐ดโ€–1,โ€–๐ดโ€–2,โ€–๐ดโ€–โˆž}

for any matrix๐ดand any1 โ‰ค ๐‘คโ‰ค โˆž.Sinceฮฃ+โˆ’ ฮฃis a symmetric matrix, we have

โ€–ฮฃ+โˆ’ ฮฃโ€–2โ‰คโ€–ฮฃ+โˆ’ ฮฃโ€–1andโ€–ฮฃ+โˆ’ ฮฃโ€–1=โ€–ฮฃ+โˆ’ ฮฃโ€–โˆž. Thus, by the proof of Theorem

??we get this corollary.

4.3. Extension to Local Di๏ฌ€erential Privacy

One advantage of our Algorithm??is that it can be easily extended to the local di๏ฌ€erential privacy (LDP) model.

(14)

Di๏ฌ€erential privacy in the local model. In LDP, we have a data universe๎ˆฐ,๐‘›players, with each holding a private data record๐‘ฅ๐‘– โˆˆ ๎ˆฐ, and a server that is in charge of co- ordinating the protocol. An LDP protocol proceeds in๐‘‡ rounds. In each round, the server sends a message, which sometimes is called a query, to a subset of the players, requesting them to run a particular algorithm. Based on the queries, each player๐‘–in the subset selects an algorithm๐‘„๐‘–, runs it on her data, and sends the output back to the server.

Definition 3. [? ] An algorithm๐‘„is(๐œ–,๐›ฟ)-locally di๏ฌ€erentially private (LDP) if for all pairs๐‘ฅ,๐‘ฅ!โˆˆ๎ˆฐ, and for all events๐ธin the output space of๐‘„, we have

โ„™[๐‘„(๐‘ฅ)โˆˆ๐ธ]โ‰ค๐‘’๐œ–โ„™[๐‘„(๐‘ฅ!)โˆˆ๐ธ] +๐›ฟ.

A multi-player protocol is๐œ–-LDP if for all possible inputs and runs of the protocol, the transcript of player iโ€™s interaction with the server is๐œ–-LDP. If๐‘‡ = 1, we say that the protocol is(๐œ–,๐›ฟ)non-interactive LDP.

Algorithm 2LDP-Thresholding

๐ˆ๐ง๐ฉ๐ฎ๐ญ:{๐‘ฅ1,๐‘ฅ2,โ‹ฏ,๐‘ฅ๐‘›}โˆผ๐‘ƒ โˆˆ๎ˆผ๐‘(๐œŽ2,๐‘ ), and๐œ–,๐›ฟโˆˆ(0,1)

1: forEach๐‘–โˆˆ[๐‘›]do

2: Denote ฬƒ๐‘ฅ๐‘–ฬƒ๐‘ฅ๐‘‡๐‘– = ๐‘ฅ๐‘–๐‘ฅ๐‘‡๐‘– +๐‘ง๐‘–, where ๐‘ง๐‘– โˆˆ โ„๐‘ร—๐‘ is a symmetric matrix with its upper triangle ( including the diagonal) being i.i.d samples from๎ˆบ(0,๐œŽ2); here ๐œŽ2 = 2 log(1.25โˆ•๐›ฟ)

๐œ–2 , and each lower triangle entry being copied from its upper triangle counterpart.

3: end for

4: Computeฮฃฬƒ = (ฬƒ๐œŽ๐‘–๐‘—)1โ‰ค๐‘–,๐‘—โ‰ค๐‘= 1๐‘›โˆ‘๐‘›

๐‘–=1ฬƒ๐‘ฅ๐‘–ฬƒ๐‘ฅ๐‘‡๐‘–,

5: Define the thresholding estimatorฮฃฬ‚ = (ฬ‚๐œŽ๐‘–๐‘—)1โ‰ค๐‘–,๐‘—โ‰ค๐‘›as

ฬ‚๐œŽ๐‘–๐‘— = ฬƒ๐œŽ๐‘–๐‘—โ‹…๐ผ[!ฬƒ๐œŽ๐‘–๐‘—!>๐›พ

โˆšlog๐‘ ๐‘› + 4โˆš

2 log(1.25โˆ•๐›ฟ)โˆš log๐‘

โˆš๐‘›๐œ– ]. (36)

6: Let the eigen-decomposition ofฮฃฬ‚ beฮฃฬ‚ = โˆ‘๐‘

๐‘–=1๐œ†๐‘–๐‘ฃ๐‘–๐‘ฃ๐‘‡๐‘– . Let๐œ†+ = max{๐œ†๐‘–,0}be the positive part of๐œ†๐‘–, then defineฮฃ+=โˆ‘๐‘

๐‘–=1๐œ†+๐‘ฃ๐‘–๐‘ฃ๐‘‡๐‘– .

7: return ฮฃ+.

Inspired by Algorithm??, it is easy to extend our DP algorithm to the LDP model.

The idea is that each๐‘‹๐‘– perturbs its covariance and aggregates the noisy version of covariance; see Algorithm??for detail.

The following theorem shows that the error bound of the output of Algorithm??

is the same as the the bound in Theorem??asymptotically, whose proof is almost the same as in Theorem??.

Theorem 3. The outputฮฃ+of Algorithm??satisfies:

๐”ผโ€–ฮฃ โˆ’ ฮฃโ€–ฬ‚ 22=๐‘‚(๐‘ 2log๐‘log1๐›ฟ

๐‘›๐œ–2 ), (37)

(15)

where the expectation is taken over the coins of the Algorithm and the randomness of {๐‘ฅ1,๐‘ฅ2,โ‹ฏ,๐‘ฅ๐‘›}. Moreover,ฮฃฬ‚ in (??) satisfiesโ€–ฮฃ โˆ’ ฮฃโ€–ฬ‚ 2๐‘ค=๐‘‚(๐‘ log๐‘›๐œ–๐‘2log1๐›ฟ).

Compared with the upper bound of๐‘‚(๐‘ 2log๐‘› ๐‘ + ๐‘ 2log๐‘›2๐‘๐œ–log2 1๐›ฟ + log๐‘›22 1๐œ–4๐›ฟ)in the central (๐œ–,๐›ฟ)-DP model, we can see that the upper bound of๐‘‚(๐‘ log๐‘›๐œ–๐‘log2 1๐›ฟ)in the local model is much more lower. We also note that the upper bound in the local model is tight, given by [?] recently.

5. Experiments

In this section, we evaluate the performance of Algorithm??and??in practice on synthetic datasets.

Data Generation. We first generate a symmetric sparse matrix ฬƒ๐‘ˆwith the sparsity ratio ๐‘ ๐‘Ÿ, that is, there are๐‘ ๐‘Ÿร—๐‘ร—๐‘non-zero entries of the matrix. Then, we let๐‘ˆ = ฬƒ๐‘ˆ+๐œ†๐ผ๐‘ for some constant๐œ†to make๐‘ˆ positive semi-definite and then scale it to๐‘ˆ = ๐‘ˆ๐‘ by some constant๐‘which makes the norm of samples less than 1 (with high probability)1. Finally, we sample{๐‘ฅ1,โ‹ฏ,๐‘ฅ๐‘›}from the multivariate Gaussian distribution๎ˆบ(0,๐‘ˆ). In this paper, we set๐œ†= 50and๐‘= 200.

Experimental Settings. To measure the performance, we compare the๐“1and๐“2norm of relative error, respectively. That is, โ€–ฮฃโ€–๐‘ˆโ€–+โˆ’๐‘ˆโ€–2 2 or โ€–ฮฃโ€–๐‘ˆโ€–+โˆ’๐‘ˆโ€–1 1 with the sample size๐‘›in three di๏ฌ€erent settings: 1) We set๐‘ = 100,๐œ– = 1,๐›ฟ = 1

๐‘›and change the sparse ratio ๐‘ ๐‘Ÿ= {0.1,0.2,0.3,0.5}. 2) We set๐œ–= 1,๐›ฟ= 1๐‘›,๐‘ ๐‘Ÿ= 0.2, and let the dimensionality๐‘ vary in{50,100,200,500}. 3) We fix๐‘= 200,๐›ฟ= 1๐‘›,๐‘ ๐‘Ÿ= 0.2and change the privacy level as๐œ– = {0.1,0.5,1,2}. We run each experiment 20 times and take the average error as the final one.

Experimental Results. Figure??and??are the results of DP-Thresholding (Algorithm

??) with๐“2 and๐“1 relative error, respectively. Figure ??and?? are the results of LDP-Thresholding (Algorithm??) with๐“2and๐“1relative error, respectively. From the figures we can see that: 1) if the sparsity ratio is largei.e.,the underlying covariance matrix is more dense, the relative error will be larger, this is due to the fact that the error depends on the sparsity s, as shown in Theorem??and??. 2) The dimensionality only slightly a๏ฌ€ects the relative error. That is, even if we double the value of๐‘, the error increases only slightly. This is consistent with our theoretical analysis in Theorem

??and??which says that the error of our private estimators is only logarithmically depending on๐‘ (i.e., log๐‘). 3) As the privacy parameter๐œ– increases (which means

1Although the distribution is not bounded by 1, actually, as we see from the previous section, we can obtain the same result as long as the๐“2norm of the samples is bounded by 1.

Referensi

Dokumen terkait