Practical Compressed Sensing: Modern Data Acquisition and Signal Processing

This thesis presents in detail the design of one of the world's first compressed sensing hardware devices, the random modulation pre-integrator (RMPI). The general concept of RMPI is not original in this thesis (see [SOS+05, KLW+06, LKD+07, TLD+10]), but it is one of the world's first working hardware devices built based on compressed sensory paradigm.

Figure 1.1: Diagram of the version 1 RMPI chip

Applications

Problems of interest

In the case that A is weakly conditioned, a classical technique to regularize the problem (known as Tikhonov regularization or ridge regression) is:. 1.1.2) The second equality holds when γ is large enough that ATA+γ2I is invertible. The last canonical problem we discuss is the Dantzig selector [CT07a], which motivated the development of the TFOCS algorithm.

Figure 1.6: Preview: image denoising with TFOCS. For full details, see Figure 4.8. The image on the right is a denoised version of the noisy image on the left.

Historical development

Classical signal processing
Estimation and the rise of alternatives to least-squares
Leading up to compressed sensing
Compressed Sensing

Its idea of exploiting sparsity and finishing has been extremely useful in astronomy; to quote from [Cor09], "The importance of CLEAN in radio astronomy has been enormous. RIP-p is specified as RIP is, except using `pnorms instead of `2norm;.

Figure 1.8: Group testing. Suppose N soldiers are tested for a disease using a sensitive but expensive blood test;

The need for the RMPI

The work in [BGI+08] introduces a simple extension of the RIP to the so-called RIP-p, and in particular the RIP-1. Another extension of the RIP is the model-based RIP [BCDH10], which is completely limited to a specific model; if the model is the set of k-sparse signals, then this is just the usual RIP.

Principles of the RMPI

The NUS

Therefore, NUS chooses Ω via a pseudo-random number generator (note that Ω must of course be known when trying to reconstruct the signal). NUS can take samples only ∆T apart, so at the fastest it must still try atfs.

The RMPI

The goal of the RMPI design is to approximate a random signed Bernoulli matrix as closely as possible. For a given number of measurements per timeM/T (ie the information rate), a decrease in channels can be compensated for by an increase in the sampling rate.

Figure 1.10: Sample measurement matrix. The green entries are zero or nearly zero. This particular matrix has rows ordered such that the measurements from each channel are grouped together.

Optimization background

The first is nonlinear conjugate gradient; when this can be used it is likely to be fast. It is used to solve the non-convex rank minimization problem in [BM03] and more recently in [MT10].

Figure 1.11: Using noiseless basis pursuit (1.1.5) with m = n/2 measurements and a sparse solution with k = m/5 nonzeros

Reading guide

Signal class

For now, these signals are not part of the signal model and do not affect us. The goal is an ambitious one: complete reconstruction of the signal to the accuracy allowed by the finite model.

Figure 2.2: Radar pulses. Top row: on the left and middle, two sample pulse windows. On the right, a radar pulse in the time domain

The design

Basic design

These ideas can be interpreted as the RMPI analogue of the “minimum distance” concept used in coding theory [McE02]. The inputs are shown at the top left and are then convolved with the spectrum of the PRBS (shown in black below).

Figure 2.3: Principles of the RMPI. Each horizontal section represents a channel, and each channel has a unique PRBS sequence of ±1’s

Theoretical performance

Input noise and channelized receivers

For RMPI, the mean squared column rates in ΦRMPIisrch (this is justkΦRMPIk2F/n= trace(ΦTRMPIΦRMPI)/n). Channelized receivers are the current technology for high-bandwidth systems; see Figure 2.6 for a frequency domain description.

Figure 2.6: Left: two pulses with noise in 2.5 GHz of bandwidth. Right: channelizing the input into channels of width 50 MHz

Related literature

Other RMPI systems
Related systems

The “Xampling” methodology [ME10, MEE09, Eld09, MEDS11], which follows the steps of the authors' blind multiband sampling theorem [ME09a], is similar in many respects to the RMPI. However, below we describe some minor criticisms of the MWC and address some comments from [MEE09] about the RMPI. The extent to which this alias signal changes over time is determined by the “Nyquist zone” in which the carrier signal was located.

Modeling the system

Simple models
SPICE
Simulink
Calibration
Phase blind calibration

The frequency response of the system at a frequency f is H(if), since it is just the Fourier transform of h. The simplest model of the system is as a block-diagonal matrix with ±1 entries on the non-zero portions. The problem with this procedure is that the phaseθ of the incoming signal is unknown.

Figure 2.7: Showing samples for one channel of various models (showing every other sample for visual clarity), all with the same chip sequence

General design considerations

Test signals

The performance of the system was primarily evaluated on its accuracy in reconstructing single pulses, although multiple pulses were occasionally tested. The simulations consisted of either "corner simulations" using extreme types of pulses (see Figure 2.18) or a combination of systematic parameters, such as pulse length, and random parameters, such as phase and frequency of the carrier wave. The pulse window was occasionally changed as this affects the spectral sparsity of the reconstruction.

Error metrics

This limit is always smaller than the dynamic range of a single pulse, as the large pulse introduces extra difficulties for the small pulse, such as creating extra errors due to jitter, as well as creating non-linear effects. Our demodulation method works well in multi-signal environments as the filtering discards the background signals. Errors are then recorded as a function of how far the estimated frequency is from the true frequency, |f−f0|.

Number of channels

The main disadvantage is that it requires an accurate estimate of the frequency, but it is not too sensitive to this, so we find that it works well in practice. Consider a single-pole integrator, and assume the time constant (which is the inverse of the pole location) is constant regardless of the number of channels. Because of the full reconstruction, we are able to examine the root-mean-square error (RMSE).

Figure 2.19: Matched filter simulations. Top row plots the error of frequency estimation, bottom row uses a 1 MHz frequency estimation cutoff to determine “successes” and “failures”, and plots the success rate (using 200 independent trials)

Chip sequence

Spectral properties of the chip sequence

Infinite period
Finite period

We wish to find the power spectral density of a random chip sequence of infinite length. The theorem is useful because stationary signals are not necessarily square integrable, so care must be taken when using the Fourier transform. The insight is that a time shift ofc(t) will have the same PSD (since it only changes the phase of the Fourier transform), so we will consider a random shift of the chip sequence by a sum.

Figure 2.21: PSD Ψ of a chip sequence with infinite repetition rate; see (2.5.9). This plot was generated by averaging many sample realizations of very long chip sequences

Chip design considerations

Chip sequence rate
Chip sequence period
Case study: test of NG chip sequence

For the left column, the energy of the ADC samples in the blue tone was .69 that of the red tone. Noise is included in the measurement; these data are a subset of the same data presented differently in Figure 2.30. Failure Criterion 1: the peak of the spectrum was not close to the actual carrier frequency.

Figure 2.24: Effect of chipping rate. See also the companion Figure 2.25. Note that chipping at an infinite frequency is not actually desirable.

Integration

General constraints

Northrop Grumman integrator design
Multipole systems

The larger the flat response of the filterH(s), the more similar the system is to being a delta function in time. This test adds noise of the form Φ(x) +z, instead of Φ(x+z), since for the latter model there is no noticeable difference in performance. The one-pole model suffers slightly, due to the decay of the time-domain transfer function(s).

Figure 2.33: Plot (a) a block diagram of a realistic integrator, showing the parasitic resistance

Recovery

Matched filter

Analysis versus synthesis
Dictionary choice
Reweighting
Debiasing
Non-linearity correction
Windowing
Further improvements

The reason we use fγ and its conjugate (ie, the negative frequency) ¯fγ is that the measurements b are real-valued. We assume that this is due to the discrete representation of the signal in post-processing. However, due to processing limitations (since the reconstruction algorithms are so far more than linear complexity), there is an upper bound on the sizeN of the discrete signal.

Figure 2.41: Matched filter estimation of an off-grid pure tone. The periodogram estimates (top row) are biased, even after accounting for the norms of the FFT of Φ.

Results

Non-idealities

Noise
Jitter
Quantization
Cross-talk
Clipping
Combining non-idealities

In the Simulink model, three of the blocks have thermal noise, but by far the dominant source is in the LNA. Thermal noise has an absolute effect (constant on the top row), while jitter has a relative constant effect (constant on the bottom row) as it causes error proportional to the input amplitude. The small signal is also desensitized or even blocked due to nonlinear effects of the large signal [Raz97].

Figure 2.54 shows the results of different levels of LNA noise. Noise power η in is measured in dBm (dB per relative to milliwatts), for 1 Hz bandwidth

Simulation results

Single pulse
Two pulses
Comparison

At higher frequencies, as shown in Figure 2.67, the results are slightly worse and ENOB is only 8.0 for 200 ns pulses. The frequency- and duration-dependent results may depend slightly on the parameters used for reconstruction, eg, Gabor dictionary parameters. The amplitude of both is -20 dBFS which is optimal according to the "sweet spot" calculations in Figure 2.61.

Figure 2.64: An input test signal and its reconstruction. Carrier frequency is f in = 1907.2 MHz

Hardware

NG InP version
Version 1
Version 2

Without the shorts, the expected power consumption was 700 mW, the dynamic range of the chip itself was 45 to 50 dB, operating from 0.1 to 3 GHz. Dynamic range is described as a function of the measurements, and is not the same as the dynamic range achieved by reconstruction. The Simulink simulations performed indicate that the system dynamic range is particularly sensitive to the jitter performance of the system clock.

Figure 2.73: Die photo of the NG InP 4-channel RMPI

Recommendations

Contributions

We refer to this algorithm as NESTA - shorthand for Nesterov's algorithm - to acknowledge the fact that it is based on his method. This, together with the accelerated convergence rate of Nesterov's algorithm [Nes05, BT09], makes NESTA a method of choice for solving large-scale problems. Another contribution of this chapter is that it also contains a fairly wide range of numerical experiments comparing various methods on problems involving realistic and challenging data.

Organization of the chapter and notations

More specifically, Section 3.5 presents a comprehensive series of numerical experiments that illustrate the behavior of several state-of-the-art methods, including interior point methods [KKB07], projected gradient techniques [HYZ07, vdBF08, FNW07], fixed point continuation, and iterative algorithms of threshold [HYZ07, YOGD08, BT09]. Before we begin, it is best to give a brief overview of the notation used throughout the chapter. Except when the matrix A is orthogonal, this functional dependence is difficult to calculate [vdBF08].

NESTA

Nesterov’s method to minimize smooth convex functions

If we performed only the second step of the algorithm with yk−1 instead of xk, we would obtain a standard first-order technique with a convergence rate of O(1/k). The novelty is that the sequencer "takes into account" previous iterations, since step 3 includes the weighted sum of already calculated gradients2. The prox-function is usually chosen such that xcp∈ Qp, which discourages zk from moving too far from the center of xcp.

Application to compressed sensing

NESTA
Updating y k
Updating z k
Computational complexity
Parameter selection
Accelerating NESTA with continuation
Some theoretical considerations

In this chapter, with the exception of §3.7, we assume that A∗A is an orthogonal projector, i.e. the rows of Aare orthonormal. The initial value of the smoothing parameter is µ0=kA∗bk`∞ and the terminal value is µf = 2σ. NESTA with continuation is applied to 10 random trials for varying numbers of continuation steps and different values of the dynamic range.

Figure 3.1: Value of f µ f (x k ) as a function of iteration k. Solid line: without continuation

Accurate optimization

Is NESTA accurate?

Then ifx0 is sufficiently sparse and if the nonzero entries ofx0 are sufficiently large, the solutionx?to (QPλ) is given by. The absolute values of the non-zero entries of x0 are distributed between 1 and 105 so that we have about 100 dB of dynamic range. We then calculate the solution (3.4.3), and make sure it satisfies the KKT optimality conditions for (QPλ) so that the optimal solution is known.

Setting up a reference algorithm for accuracy tests

This can also be seen from Figure 3.4, which plots NESTA's solution against the optimal solution, confirming the excellent accuracy of our algorithm. The absolute values of the data supporting the optimal solution are plotted. Furthermore, Figure 3.4 shows the data from the FISTA solution versus that of the optimal solution, and one observes a very good fit (almost perfect if the size of a component of x? is greater than 3).

The smoothing parameter µ and NESTA’s accuracy

We plot the absolute values of the entries on the set where the size of the optimal solution exceeds 1. To make sure that the FISTA solution is very close to the optimal solution, we check that the KKT stationarity condition is almost verified. Specifically, notice in Table 3.2 that for this particular experiment, reducing µ by a factor of 10 gives about 1 additional digit of accuracy to the optimal value.

Figure 3.5: Entries of the computed solutions versus the optimal solution. We plot the absolute values of the entries on the set where the magnitude of the optimal solution exceeds 1.

Numerical comparisons

State-of-the-art methods

NESTA
Gradient projections for sparse reconstruction (GPSR)
Sparse reconstruction by separable approximation (SpaRSA)
Spectral projected gradient (SPGL1)
Fixed point continuation method (FPC)
FPC active set (FPC-AS)
Bregman
Fast iterative soft-thresholding algorithm (FISTA)

Constrained versus unconstrained minimization
Experimental protocol
Numerical results

The case of exactly sparse signals
Approximately sparse signals

Most of the algorithms discussed in this section are considered state-of-the-art in the sense that they are the most competitive among sparse reconstruction algorithms. The solution from FISTA will also be used to assess the accuracy of other algorithms. The results are listed in Tables 3.5 (Crit.. 2); the results of using both stopping criteria are almost identical.

Table 3.3: Number of function calls N A averaged over 10 independent runs. The sparsity level s = m/5 and the stopping rule is Crit

An all-purpose algorithm

Non-standard sparse reconstruction: ` 1 analysis
Numerical results for non-standard ` 1 minimization
Total-variation minimization
Numerical results for TV minimization

The rise time and fall time of the pulse envelope are comparable to the Doppler pulse. The diagrams below in Figure 3.7 show the spectrum of the recovered signal using analysis and synthesis, respectively. For TwIST, it is important that the number of function calls A and A* is not taken as an approximation for the computational time of the algorithm.

Figure 3.7: Top: spectrum estimate of the exact signal, no noise. The pure tone at 60 dB and the Doppler radar at 20 dB dominate the 0 dB frequency-hopping pulses

Handling non-projectors

Revisiting the projector case

Non-projectors for = 0 case

Non-projectors for > 0 case

Discussion

Extensions

Software

Motivation

The literature

Our approach

Conic formulation
Dualization
Smoothing
First-order methods

Contributions

Software

Organization of the chapter

Conic formulations

Alternate forms

The dual

The differentiable case

Smoothing

Composite forms

Projections

A novel algorithm for the Dantzig selector

The conic form

Smooth approximation

Implementation

Exact penalty

Alternative models

Further instantiations

A generic algorithm

The LASSO

Nuclear-norm minimization

Total-variation minimization

Combining ` 1 analysis and total-variation minimization

Implementing first-order methods

Introduction

The variants

Step size adaptation

Linear operator structure

Accelerated continuation

Strong convexity

Dual-function formulation

Background

Fenchel dual formulation

Convergence

Convergence when f is smooth
Convergence of inner iteration

Convergence of outer iteration

Overall analysis

Numerical experiments

Dantzig selector: comparing first-order variants

LASSO: comparison with SPGL1

Wavelet analysis with total-variation

Matrix completion: expensive projections

Extensions

Automatic restart

Specialized solvers for certain problems

Noiseless basis pursuit
Conic problems in standard form
Matrix completion problems

Software: TFOCS

Discussion

Appendix: exact penalty

Appendix: creating a synthetic test problem

Improvements to TFOCS