Reconstruction and Compressed Sensing
5.2 CS THEORY
5.2.3 Regularization of the Linear Model through Sparsity
Melvin-5220033 book ISBN : 9781891121531 September 14, 2012 17:41 158
158 C H A P T E R 5 Radar Applications of Sparse Reconstruction
Melvin-5220033 book ISBN : 9781891121531 September 14, 2012 17:41 159
5.2 CS Theory 159
0 50 100 150 200 250
−1
−0.8
−0.6
−0.4
−0.2 0 0.2 0.4 0.6 0.8 1
0 50 100 150 200 250
−1
−0.8
−0.6
−0.4
−0.2 0 0.2 0.4 0.6 0.8 1
(b) (a)
FIGURE 5-4 Two reconstructions of a signal withN =256,M=75, ands=15. The entries ofAare generated randomly from a Gaussian distribution and then scaled to yield unit-norm columns. The true signal is shown with circles, while the estimate is shown with crosses. Pane (a) shows the minimum2reconstruction, while pane (b) shows the minimum 1reconstruction. A very similar example was shown in [21]. The reconstructions were computed using theCVXsoftware package [22].
5.2.4 1 Regularization
In Section 5.3 we will explore numerous algorithms for SR. Here, we will explore a problem formulation that motivates many of these algorithms. Let us return to considering the noisy data case where e=0 given in (5.1). In this setting, we would like to find the solution ˆx given by
ˆx =argmin
x x0 subject to Ax− y2 ≤σ (5.11) However, this problem is once again NP-hard and effectively impossible to solve for prob- lems of interest in radar signal processing. As mentioned earlier, the issue is that the0norm is not amenable to optimization. Figure 5-1 provides an intuitive alternative: we can replace the intractable0 norm with a similar norm for which optimization is simpler. We have already seen that the2norm provides one possibility, but the resulting solutions tend to be nonsparse. Instead, we will consider the convex relaxation [23] of (5.11) using the1norm:
ˆxσ =argmin
x x1 subject to Ax− y2 ≤σ (5.12)
We will refer to this convex optimization problem as Basis Pursuit De-Noising (BPDN).11 By virtue of being a convex cost function with a convex constraint, the problem described in (5.12) does not suffer from local minima, and a variety of mature techniques exist for solving the problem in polynomial time [24]. Figure 5-4(b) shows the reconstruction of our simple example signal with an1penalty. In this noise-free case, the signal is reconstructed perfectly using the1-based cost function. Notice that this optimization problem has an obvious parameter,σ, that could be varied to obtain different solutions. We will explore this idea in depth in Section 5.3.1.1.
11See Section 5.3.1.1 for details on our naming convention.
Melvin-5220033 book ISBN : 9781891121531 September 14, 2012 17:41 160
160 C H A P T E R 5 Radar Applications of Sparse Reconstruction
Regularization using the1norm has a long history, for example, [25]. We shall dis- cuss several formulations of the problem described in (5.12) and algorithms for solving it in Section 5.3.1. When the problem is solved with an2penalty in place of the1norm, the result is termed Tikhonov regularization [26],12which is known in the statistics com- munity as ridge regression [28]. This formulation has the advantage of offering a simple, closed-form solution that can be implemented robustly with an SVD [20]. Unfortunately, as in the noise-free case, this approach does not promote sparsity in the resulting solutions.
We mention Tikhonov regularization because it has a well-known Bayesian interpretation using Gaussian priors. It turns out that the1-penalized reconstruction can also be derived using a Bayesian approach.
To cast the estimation of xtruein a Bayesian framework, we must adopt priors on the signal and disturbance. First, we will adopt a Laplacian prior13 on the unknown signal xtrueand assume that the noise e is circular Gaussian with known covariance, that is,
e∼CN(0, ) p(xtrue)∝exp −λ
2xtrue1
where the normalization constant on p(xtrue) is omitted for simplicity. Given no other information, we could set = I , but we will keep the generality. We can then find the MAP estimate easily as
ˆxλ =argmax
x
p(x|y)
=argmax
x
p(y|x)p(x) p(y)
=argmax
x
p(y|x)p(x)
=argmax
x
exp −1
2Ax− y2
exp −λ 2x1
=argmin
x Ax− y2+λx1
wherex2 = xH−1x. The resulting optimization problem is precisely what we would expect given the colored Gaussian noise prior. Sinceis a covariance matrix, and hence positive definite and symmetric, the problem is convex and solvable with a variety of techniques. In fact, we can factor the inverse of the covariance using the Cholesky decom- position as−1 = RHR to obtain
ˆxλ=argmin
x Ax− y2+λx1
=argmin
x R Ax−R y22+λx1
=argmin
x
¯Ax− ¯y22+λx1 (5.13)
12An account of the early history of Tikhonov regularization, dating to 1955, is given in [27].
13Recent analysis has shown that, while the Laplacian prior leads to several standard reconstruction algorithms, random draws from this distribution are not compressible. Other priors leading to the same 1penalty term but yielding compressible realizations have been investigated. See [29] for details.
Melvin-5220033 book ISBN : 9781891121531 September 14, 2012 17:41 161
5.2 CS Theory 161
where ¯A = R A, and ¯y = R y. This problem is equivalent to (5.12) whenλis chosen correctly, as detailed in Section 5.3.1.1. Readers familiar with adaptive processing will recognize the application of R as a pre-whitening step. Indeed, this processing is the1
version of the typical pre-whitening followed by matched filtering operation used in, for example, STAP [15,30].
Returning to the geometric interpretation of the problem, examination of Figure 5-1 provides an intuitive geometric reason that the1 norm is effective for obtaining sparse solutions. In particular, sparse solutions contain numerous zero values and thus lie on the coordinate axes in several of their dimensions. Since the1unit ball is “spiky” (i.e., more pointed along the coordinate axes than the rounded2norm), a potential solution x with zero entries will tend to have a smaller1 norm than a non-sparse solution. We could of course consider p < 1 to obtain ever more “spiky” unit balls, as is considered in [31].
Using p < 1 allows sparse signals to be reconstructed from fewer measurements than p=1, but at the expense of solving a non-convex optimization problem that could feature local minima.
This geometric intuition can be formalized using so-called tube and cone constraints as described in, for example, [1]. Using the authors’ terminology, the tube constraint follows from the inequality constraint in the optimization problem represented in equation (5.12):
A(xtrue−ˆxσ)2 ≤Axtrue− y2+ A ˆxσ −y2
≤2σ
The first line is an application of the triangle inequality satisfied by any norm, and the second follows from the assumed bound on e and the form of (5.12). Simply put, any vector x that satisfiesAx− y2 ≤ σ must lie in a cylinder centered around Axtrue. When we solve the optimization problem represented in equation (5.12), we choose the solution inside this cylinder with the smallest1norm.
Since ˆxσ is a solution to the convex problem described in (5.12) and thus a global minimum, we obtain the cone constraint14ˆxσ1 ≤xtrue1. Thus, the solution to (5.12) must lie inside the smallest1ball that contains xtrue. Since this1ball is “spiky”, our hope is that its intersection with the cylinder defined by the tube constraint is small, yielding an accurate estimate of the sparse signal xtrue. These ideas are illustrated in two dimensions in Figure 5-5. The authors of [1] go on to prove just such a result, a performance guarantee for CS. However, sparsity of the true signal is not enough by itself to provide this guarantee.
We will need to make additional assumptions on the matrix A.