Performance Guarantees - CS THEORY - Reconstruction and Compressed Sensing

Reconstruction and Compressed Sensing

5.2 CS THEORY

5.2.5 Performance Guarantees

Melvin-5220033 book ISBN : 9781891121531 September 14, 2012 17:41 161

5.2 CS Theory 161

where ¯A = R A, and ¯y = R y. This problem is equivalent to (5.12) whenλis chosen correctly, as detailed in Section 5.3.1.1. Readers familiar with adaptive processing will recognize the application of R as a pre-whitening step. Indeed, this processing is the1

version of the typical pre-whitening followed by matched filtering operation used in, for example, STAP [15,30].

Returning to the geometric interpretation of the problem, examination of Figure 5-1 provides an intuitive geometric reason that the1 norm is effective for obtaining sparse solutions. In particular, sparse solutions contain numerous zero values and thus lie on the coordinate axes in several of their dimensions. Since the1unit ball is “spiky” (i.e., more pointed along the coordinate axes than the rounded2norm), a potential solution x with zero entries will tend to have a smaller1 norm than a non-sparse solution. We could of course consider p < 1 to obtain ever more “spiky” unit balls, as is considered in [31].

Using p < 1 allows sparse signals to be reconstructed from fewer measurements than p=1, but at the expense of solving a non-convex optimization problem that could feature local minima.

This geometric intuition can be formalized using so-called tube and cone constraints as described in, for example, [1]. Using the authors’ terminology, the tube constraint follows from the inequality constraint in the optimization problem represented in equation (5.12):

A(x^true−ˆx_σ)₂ ≤Ax^true− y₂+ A ˆx_σ −y2

≤2σ

The first line is an application of the triangle inequality satisfied by any norm, and the second follows from the assumed bound on e and the form of (5.12). Simply put, any vector x that satisfiesAx− y2 ≤ σ must lie in a cylinder centered around Ax^true. When we solve the optimization problem represented in equation (5.12), we choose the solution inside this cylinder with the smallest1norm.

Since ˆx_σ is a solution to the convex problem described in (5.12) and thus a global minimum, we obtain the cone constraint¹⁴ˆx_σ1 ≤x^true₁. Thus, the solution to (5.12) must lie inside the smallest1ball that contains x^true. Since this1ball is “spiky”, our hope is that its intersection with the cylinder defined by the tube constraint is small, yielding an accurate estimate of the sparse signal x^true. These ideas are illustrated in two dimensions in Figure 5-5. The authors of [1] go on to prove just such a result, a performance guarantee for CS. However, sparsity of the true signal is not enough by itself to provide this guarantee.

We will need to make additional assumptions on the matrix A.

Melvin-5220033 book ISBN : 9781891121531 September 14, 2012 17:41 163

5.2 CS Theory 163

While the notion of Kruskal rank is important in sparse regularization, it has very limited utility for the problems of interest to radar practitioners. The regular rank of a matrix has limited utility, because an arbitrarily small change in the matrix can alter the rank. Put another way, a matrix can be “almost” rank deficient. In practice, we use measures like the condition number [20] to assess the sensitivity of matrix operations to small errors. The problem with Kruskal rank is analogous. If there exists a sparse vector such that Ax≈0, this will not violate the Kruskal rank condition. However, when even a small amount of noise is added to the measurements, distinctions based on arbitrarily small differences in the product Ax will not be robust. What we need is a condition on A that guarantees sparse solutions will be unique, but also provides robustness in the presence of noise. As we shall see, this condition will also guarantee successful sparse reconstruction when solving the convex relaxation of our0problem.

5.2.5.2 The Restricted Isometry Property

An isometry is a continuous, one-to-one invertible mapping between metric spaces that preserves distances [34]. As discussed in the previous section, we want to establish a condition on A that provides robustness for sparse reconstruction. While several conditions are possible, we shall focus on the restricted isometry property (RIP). In particular, we will define the restricted isometry constant (RIC)Rn(A)as the smallest positive constant such that

(1−Rn(A))x²₂≤ Ax²₂≤(1+Rn(A))x²₂ (5.14) for all x such thatx0 ≤n. In other words, the mapping A preserves the energy in sparse signals with n or fewer nonzero coefficients up to a small distortion. We refer to this condition as RIP, since the required approximate isometry is restricted to the set of sparse signals.

As we can see, this condition avoids the problem of arbitrarily small Ax values for sparse x that can occur when only the Kruskal rank condition is required. This property is analogous to a full rank matrix having a small condition number. Indeed, the RIP can be interpreted as a requirement on the condition number of all submatrices of A with n or fewer columns. Furthermore, notice that A has Kruskal rank of at least n provided thatRn(A) <1. Thus, the RIP guarantees the uniqueness of a sparse solution with any meaningful RIC. To guarantee good SR performance, a smaller RIC is required.

We are now in a position to state one of the fundamental results in CS. IfR2s(A) <

√2−1, then

x^true− ˆx_σ₂ ≤C₀s^−1/2x^true−x^true_s ₁+C₁σ (5.15) where C0 and C1are small positive constants whose values and derivation can be found in [35].¹⁶First, the term x^true_s is the best s sparse approximation to x^true. Thus, if x^trueis truly sparse, then the first term is zero. If the true signal is not actually sparse, then the reconstruction remains well behaved. The second term is a small multiple of the noise energy. If the measurements y are noise free, then solving problem described in (5.12) with σ =0 produces a perfect reconstruction of a truly sparse signal. The theorem requires a

16The reference is, in our opinion, a concise and elegant proof of this result. More detailed and perhaps pedagogically useful proofs, albeit with slightly inferior guarantees, can be found in [1]. This proof is also limited to the real case, but the extension to complex-valued signals, along with slightly less restrictive RIC results, is provided in [36].

Melvin-5220033 book ISBN : 9781891121531 September 14, 2012 17:41 164

164 C H A P T E R 5 Radar Applications of Sparse Reconstruction

constraint onR2s(A), even though the signal of interest is assumed to be s sparse, because the proof, like the previous proof for uniqueness of the sparse solution, relies on preserving the energy of differences between s-sparse signals.

We should emphasize that this condition is sufficient but not necessary. Indeed, good reconstructions using (5.12) are often observed with a measurement matrix that does not even come close to satisfying the RIP. For example, the authors in [37] demonstrate recovery of sinusoids in noise using SR with performance approaching the Cramer-Rao lower bound for estimating sinusoids in noise, despite having the RIC for A approach unity.¹⁷ Nonetheless, RIP offers powerful performance guarantees and a tool for proving results about various algorithms, as we shall see in Section 5.3. Indeed, in some cases the effort to prove RIP-based guarantees for an algorithm has led to improvements in the algorithm itself, such as the CoSaMP algorithm [39] discussed in Section 5.3.4. Other conditions exist in the literature, for example, [40–42], and indeed developing less restrictive conditions is an active area of research.

5.2.5.3 Matrices that Satisfy RIP

At this point, it may seem that we have somehow cheated. We started with an NP-hard problem, namely finding the solution with the smallest0norm. The result given in (5.15) states that we can instead solve the convex relaxation of this problem and, in the noise-free case, obtain the exact same solution. The missing detail is that we have assumed that A has a given RIC. Unfortunately, computing the RIC for a given matrix is an NP-hard task.

Indeed, it requires computing an SVD of every possible subset of n columns of A. For a matrix of any meaningful size, this is effectively impossible.

So, it seems that we may have traded an NP-hard reconstruction task for an NP-hard measurement design task. Put another way, for our straightforward convex reconstruction problem to have desirable properties, we must somehow design a measurement matrix A with a property that we cannot even verify. Fortunately, an elegant solution to this problem exists. Rather than designing A, we will use randomization to generate a matrix that will satisfy our RIC requirements with very high probability.¹⁸

Numerous authors have explored random matrix constructions that yield acceptable RICs with high probability. Matrices with entries that are chosen from a uniform random distribution, a Gaussian distribution,¹⁹a Bernoulli distribution, as well as other examples satisfy the required RIP provided that M is greater than Cs log(N/s)for some distribution- dependent constant C [21]. A very important case for radar and medical imaging applications is that a random selection of the rows of a discrete Fourier transform matrix also satisfies a similar condition with M ≥Cs log⁴(N)[45]. Furthermore, A can be constructed

17See also [38] for an investigation of modifying the RIP property to address performance guarantees in situations where the standard RIP is violated.

18Several attempts have been made to develop schemes for constructing A matrices deterministically with the desired RIP properties, for example [47]. However, these results generally require more measurements (i.e., larger M) to guarantee the same RIC. See [43] for an example construction based on Reed-Muller codes that does not satisfy RIP for all vectors but preserves the energy of a randomly drawn sparse vector with high probability. Expander graphs have also been explored as options for constructing appropriate forward operators with accompanying fast reconstruction algorithms; see for example [44].

19Indeed, this result holds for the wider class of sub-Gaussian distributions.

Melvin-5220033 book ISBN : 9781891121531 September 14, 2012 17:41 165

5.2 CS Theory 165

as, where is a basis forC^N, andis a random matrix of one of the mentioned classes [21]. The proofs of these results rely on probabilistic arguments involving con- centration of measure, see for example, [46]. Intuitively, the idea is that random vectors drawn in a very high-dimensional space are unlikely to have large inner products.

For random matrices, we have seen that we need to collect on the order of s log(N/s) measurements. A simple analog to bit counting may provide some intuition about this bound. If we have N coefficients with s nonzero values, then there are^N_spossible sets of nonzero coefficients. If we allow c0 quantization levels for encoding each nonzero coefficient, then the total number of required bits for this information is log{^N_s+c0s}.

If we neglect the bits for encoding the values, we can obtain log

N s

+c0s

≈log N

≤log N e s

(5.16)

=s log(N/s)+s log e (5.17) Thus, a simple calculation of the required coding bits yields a leading-order term of the same form as the number of required measurements predicted by CS theory. Intuitively, randomization of the A matrix ensures with high probability that each measurement provides a nearly constant increment of new information bits. This result in no way constitutes a proof but is presented to provide some intuitive insight about the origin of this expression.

We should mention that randomization has a long history in radar signal processing.

For example, array element positions can be randomized to reduce the sidelobes in sparsely populated arrays [48–50]. It is also well understood that jittering or staggering the pulse repetition frequency can eliminate ambiguities [13]. The transmitted waveform itself can also be randomized, as in noise radar [51,52], to provide a thumbtack-like ambiguity function. From a CS perspective, these randomization techniques can be viewed as attempts to reduce the mutual coherence of the forward operator A [17].

5.2.5.4 Mutual Coherence

As pointed out already, estimating and testing the RIC for large M is impractical. A tractable yet conservative bound on the RIC can be obtained through the mutual coherence of the columns of A defined as

M(A)=max

i=j |A^H_i Aj|

Mutual coherence can be used to guarantee stable inversion through1recovery [53,54], although these guarantees generally require fairly small values of s. Furthermore, the RIC is conservatively bounded byM(A) ≤ Rs(A) ≤ (s −1)M(A). The upper bound is very loose, as matrices can be constructed for which the RIC is nearly equal to the mutual coherence over a wide range of s values [55].

The mutual coherence is of particular importance in radar signal processing. Recall from Section 5.2.2.2 that entries of the Gramian matrix A^HA are samples of the radar ambiguity function. The mutual coherence is simply the maximum off-diagonal of this matrix.

Thus, the mutual coherence of a radar system can be reduced by designing the ambiguity

Melvin-5220033 book ISBN : 9781891121531 September 14, 2012 17:41 166

166 C H A P T E R 5 Radar Applications of Sparse Reconstruction

function appropriately. This view was explored for the ambiguity function, along with a deterministic approach to constructing waveforms that yield low mutual coherence for the resulting A, in [56]. In a nutshell, the thumbtack ambiguity functions that are known to be desirable for radar systems [57] are also beneficial for CS applications. In [12], the authors use mutual coherence as a surrogate for RIP when designing waveforms for multistatic SAR imaging. As one might expect, noise waveforms provide good results in both scenarios.²⁰

The ambiguity function characterizes the response of a matched filter to the radar data. At the same time, the ambiguity function determines the mutual coherence of the forward operator A, which provides insights into the efficacy of SR and CS. Thus, CS does not escape the limitations imposed by the ambiguity function and the associated matched filter. Note that virtually all SR algorithms include application of the matched filter A^Hrepeatedly in their implementations. Indeed, SR algorithms leverage knowledge of the ambiguity function to approximately deconvolve it from the reconstructed signal.

Put another way, SR can yield signal estimates that lack the sidelobe structure typical of a matched filtering result, but the extent to which this process will be successful is informed by the ambiguity function.

Dalam dokumen Principles of Modern Radar. Volume 2.pdf (Halaman 186-191)