LEARNING POWER SYSTEM PARAMETERS FROM LINEAR MEASUREMENTS
6.3 Fundamental Trade-offs
informationI(πΊ;A|B) from above byH(A) βH(Z). The general converse stated in Theorem 6.3.1 is used in asserting the results on worst-case sample complexity in Theorem 6.4.1. Next, we analyze the sufficient condition for recovering a graph matrixY(πΊ). Before proceeding to the results, we introduce a novel characterization of the distributionGπ, from which a graphπΊ is sampled. In particular, the graphπΊ is allowed to have high-degree nodes.
Characterization of Graph Distributions
Let ππ(πΊ) denote the degree of node π β V inπΊ. Denote byVLarge(π) := {π β V
ππ(πΊ) > π}the set of nodes having degrees greater than thethreshold parameter 0 β€ π β€ πβ2 and VSmall(π) := V\VLarge(π) the set of nodes for all π-sparse column vectors ofY. With acounting parameter0 β€ πΎ β€ π, we define a set of graphs wherein each graph consists of no more thanπΎ nodes with degree larger than π, denoted byC(π, π, πΎ) :={πΊ βC(π) |
VLarge(π)
β€ πΎ}.The following definition characterizes graph distributions.
Definition 6.3.1 ((π, πΎ , π)-sparse distribution). A graph distribution Gπ is said to be (π, πΎ , π)-sparseif assuming thatπΊ is distributed according to Gπ, then the probability thatπΊbelongs toC(π, π, πΎ) is larger than 1βπ,i.e.,
PGπ(πΊ βC(π, π, πΎ)) β€ π . (6.10) 1) Uniform Sampling of Trees:
Based on the definition above, for particular graph distributions, we can find the associated parameters. We exemplify by considering two graph distributions introduced in Example 7. Denote byUT(π) the uniform distribution on the setT(π) of all trees withπnodes.
Lemma 22. For anyπβ₯ 1andπΎ >0, the distributionUT(π) is(π, πΎ ,1/πΎ)-sparse.
2) ErdΕs-RΓ©nyi(π, π)model:
Denote by GER(π, π) the graph distribution for the ErdΕs-RΓ©nyi (π, π) model.
Similarly, the lemma below classifiesGER(π, π) into a(π, πΎ , π)-sparse distribution with appropriate parameters.
Lemma 23. For any π(π, π)that satisfies π(π, π) β₯ 2π β(π)/(ln 1/π)and πΎ >0, the distributionGER(π, π)is(π, πΎ , πexp(βπ β(π))/πΎ)-sparse.
The proofs of Lemmas 22 and 23 are in Appendix 6.D.
Remark 11. It is worth noting that the(π, πΎ , π)-sparsityis capable of characterizing any arbitrarily chosen distribution. The interesting part is that for some of the well-known distributions, such as GER(π, π), this sparsity characterization offers a method that can be used in the analysis and moreover, it leads to an exact characterizationof sample complexity for the noiseless case. Therefore, for the particular examples presented in Lemma 22 and Lemma 23, the selected threshold and counting parameters for both of them are βtightβ (up to multiplicative factors), in the sense that the corresponding sample complexity matches (up to multiplicative factors) the lower bounds derived from Theorem 6.3.1. This can be seen in Corollary 6.4.1 and 6.4.2.
Algorithm 9: A Three-stage Recovery Scheme. The first stage focuses on solving each column of the matrixYindependently usingβ1-minimization. In the second stage, the recovery correctness of the first stage is further verified via consistency-checking, which utilizes the fact that the matrix to be recoveredYis symmetric. The parameterπΎ is set to zero for the analysis of noiseless parameter reconstruction.
Data: Matrices of measurementsAandB Result: Estimated graph matrixX
Step (a): Recovering columns independently:
for π β V do
Solve the followingβ1-minimization and obtain an optimalX:
minimize ππ
1
subject to||Bππ β π΄π||2 β€ πΎ , ππ βFπ.
end
Step (b): Consistency-checking:
forS β V with|S |=πβπΎ do
if |ππ, π βππ ,π|> 2πΎ for someπ, π β Sthen continue;
end else
for π β S do
Step (c): Resolving unknown entries:
Update πS
π by solving the linear system:
BSπS
π = π΄πβBSπS
π . end
end break;
end
return X=(π1, . . . , ππ);
Sufficient Conditions
In this subsection, we consider the sufficient conditions (achievability) for parameter reconstruction. The proofs rely on constructing a three-stage recovery scheme (Algorithm 9), which contains three steps βcolumn-retrieving,consistency-checking andsolving unknown entries. The worst-case running time of this scheme depends on the underlying distributionGπ4. The scheme is presented as follows.
Figure 6.3.1: The recovery of a graph matrixY using the three-stage scheme in Algorithm 10. TheπβπΎ columns ofYcolored by gray are first recovered via the β1-minimization (6.11a)-(6.11c) in step (a), after they are accepted by passing the consistency check in step (b). Then, symmetry is used for recovering the entries in the matrix marked by green. Leveraging the linear measurements again, in step (c), the remainingπΎ2entries in the white symmetric sub-matrix are solved using Equation (6.12).
1) Three-stage Recovery Scheme:
Step (a): Retrieving columns. In the first stage, using β1-norm minimization, we recover each column ofYbased on (6.1):
minimize ππ
1 (6.11a)
subject to ||Bππβ π΄π||2 β€ πΎ , (6.11b)
ππ βFπ. (6.11c)
Let πS
π := (ππ, π)πβS be a length-|S |column vector consisting of |S|coordinates in ππ, the π-th retrieved column. We do not restrict the methods for solving theβ1-norm
4Although for certain distributions, the computational complexity is not polynomial inπ, the scheme still provides insights on the fundamental trade-offs between the number of samples and the probability of error for recovering graph matrices. Furthermore, motivated by the scheme, a polynomial-time heuristic algorithm is provided in Section 6.5 and experimental results are reported in Section 6.6.
minimization in (6.11a)-(6.11c), as long as there is a unique solution for sparse columns with fewer than πnon-zeros (provided enough number of measurements and the parameter π >0 is defined in Definition 6.3.1).
Step (b): Checking consistency.
In the second stage, we check for error in the decoded columnsπ1, . . . , ππusing the symmetry property (perturbed by noise) of the graph matrixY. Specifically, we fix a subsetS β V with a given size|S|=πβπΎfor some integer50β€ πΎ β€ π. Then we check if|ππ, π β ππ ,π| β€ 2πΎ for allπ, π β S. If not, we choose a different setSof the same size. This procedure stops until either we find such a subsetSof columns, or we go through all possible subsets without finding one. In the latter case, an error is declared and the recovery is unsuccessful. It remains to recover the vectors ππ for
π β S.
Step (c): Resolving unknown entries. In the former case, for each vectorππ, π β S, we accept its entries ππ, π, π β S, as correct and therefore, according to the symmetry assumption, we know the entries ππ, π, π β S, π β S (equivalently {πS
π : π β S}), which are used together with the sub-matrices BS and BS to compute the other entries ππ, π, π β S, ofππ using (6.11b):
BSπS
π = π΄π βBSπS
π , π β S. (6.12)
Note that to avoid being over-determined, in practice, we solve BK
S
πS
π = π΄K
π βBKSπS
π , π β S
whereBK
S is aπΎ ΓπΎ matrix whose rows are selected fromBS corresponding to K β V with|K | =πΎ andBKS selects the rows ofBS in the same way. We combine πS
π andπS
π to obtain a new estimate ππ for each π β S. Together with the columns ππ, π β S, that we have accepted, they form the estimated graph matrixX. We illustrate the three-stage scheme in Figure 6.3.1. In the sequel, we analyze the sample complexity of the three-stage scheme based on the (π, πΎ , π)-sparse distributions defined in Definition 6.3.1.
2) Analysis of the Scheme:
5The choice ofπΎdepends on the structure of the graph to be recovered and more specifically,πΎ is the counting parameter in Definition 6.3.1. In Theorem 6.3.2 and Corollary 6.3.1, we analyze the sample complexity of this three-stage recovery scheme by characterizing an arbitrary graph into the classes specified by Definition 6.3.1 with a fixedπΎ.
LetFβ‘Rfor the simplicity of representation and analysis. We now present another of our main theorems. Consider the models defined in Section 6.2 and 6.2. The Ξ-probability of erroris defined to be the maximal probability that theβ2-norm of the difference between the estimated vector π βRπand the original vectorπ βRπ (satisfying π΄=Bπ+π and both π΄andBare known to the estimator) is larger than Ξ >0:
πP(Ξ) := sup
πβY(π)
P(||πβπ||2 > Ξ)
where Y(π) is the set of all π-sparse vectors in Rπ and the probability is taken over the randomness in the generator matrixBand the additive noise π. Given a generator matrixB, the correspondingrestricted isometry constantdenoted byπΏπis the smallest positive number with
1βπΏπ
||x||22β€ ||BSx||22β€ 1+πΏπ
||x||22 (6.13) for all subsetsS β V of size |S| β€ πand allxβR|S|. Below we state a sufficient condition6 derived form the three-stage scheme for parameter reconstruction.
Theorem 6.3.2(Achievability). Suppose the generator matrix satisfies thatBK
S β RπΎΓπΎ is invertible for all S β V and K β V with |S | = |K | = πΎ. Let the distribution Gπ be (π, πΎ , π)-sparse. If the three-stage scheme in Algorithm 10 is used for recovering a graph matrixY(πΊπ)ofπΊπthat is sampled according toGπ, then the probability of error satisfiesπP(π) β€ π+ (πβπΎ)πP(Ξ)withπgreater or equal to
2
πΞ+ Ξ||B||2+πΎ 1βπΏ2πΎ
2(πβπΎ) +πΎ π(B)
whereπΏ2πΎis the corresponding restricted isometry constant ofBwithπ=2πΎdefined in(6.13)and
π(B):= max
S,K βV,|S |=|K |=πΎ
BS
2
BK
S
β1 2.
The proof is in Appendix 6.B. The theory of classical compressed sensing (see [182, 183, 193]) implies that for noiseless parameter reconstruction, if the generator matrix B has restricted isometry constants πΏ2π and πΏ3π satisfying πΏ2π + πΏ3π < 1, then
6Note thatπΎ cannot be chosen arbitrarily andΞdepends on πΎ; otherwise the probability of errorπP(Ξ)will blow up. Theorem 6.4.2 indicates that for Gaussian ensembles settingΞ =π(πΎ)= π(β
ππN)is a valid choice whereπNis the standard deviation of each independentππ , πinZ.
all columnsππ with π β VSmall are correctly recovered using the minimization in (6.11a)-(6.11c). Denote by spark(B) the smallest number of columns in the matrix B that are linearly dependent (see [194] for the requirements on the spark of the generator matrix to guarantee desired recovery criteria). The following corollary is an improvement of Theorem 6.3.2 for the noiseless case. The proof is postponed to Appendix 6.C.
Corollary6.3.1. LetZ=0and suppose the generator matrixBhas restricted isometry constantsπΏ2π andπΏ3π satisfyingπΏ2π+πΏ3π <1and furthermore,spark(B) > 2πΎ. If the distributionGπis(π, πΎ , π)-sparse, then the probability of error for the three-stage scheme to recover the parameters of a graph matrixY(πΊπ)ofπΊπthat is sampled according toGπsatisfiesπP β€ π.