This thesis presents in detail the design of one of the world's first compressed sensing hardware devices, the random modulation pre-integrator (RMPI). The general concept of RMPI is not original in this thesis (see [SOS+05, KLW+06, LKD+07, TLD+10]), but it is one of the world's first working hardware devices built based on compressed sensory paradigm.
Applications
Problems of interest
In the case that A is weakly conditioned, a classical technique to regularize the problem (known as Tikhonov regularization or ridge regression) is:. 1.1.2) The second equality holds when γ is large enough that ATA+γ2I is invertible. The last canonical problem we discuss is the Dantzig selector [CT07a], which motivated the development of the TFOCS algorithm.
Historical development
- Classical signal processing
- Estimation and the rise of alternatives to least-squares
- Leading up to compressed sensing
- Compressed Sensing
Its idea of exploiting sparsity and finishing has been extremely useful in astronomy; to quote from [Cor09], "The importance of CLEAN in radio astronomy has been enormous. RIP-p is specified as RIP is, except using `pnorms instead of `2norm;.
The need for the RMPI
The work in [BGI+08] introduces a simple extension of the RIP to the so-called RIP-p, and in particular the RIP-1. Another extension of the RIP is the model-based RIP [BCDH10], which is completely limited to a specific model; if the model is the set of k-sparse signals, then this is just the usual RIP.
Principles of the RMPI
The NUS
Therefore, NUS chooses Ω via a pseudo-random number generator (note that Ω must of course be known when trying to reconstruct the signal). NUS can take samples only ∆T apart, so at the fastest it must still try atfs.
The RMPI
The goal of the RMPI design is to approximate a random signed Bernoulli matrix as closely as possible. For a given number of measurements per timeM/T (ie the information rate), a decrease in channels can be compensated for by an increase in the sampling rate.
Optimization background
The first is nonlinear conjugate gradient; when this can be used it is likely to be fast. It is used to solve the non-convex rank minimization problem in [BM03] and more recently in [MT10].
Reading guide
Signal class
For now, these signals are not part of the signal model and do not affect us. The goal is an ambitious one: complete reconstruction of the signal to the accuracy allowed by the finite model.
The design
Basic design
These ideas can be interpreted as the RMPI analogue of the “minimum distance” concept used in coding theory [McE02]. The inputs are shown at the top left and are then convolved with the spectrum of the PRBS (shown in black below).
Theoretical performance
- Input noise and channelized receivers
For RMPI, the mean squared column rates in ΦRMPIisrch (this is justkΦRMPIk2F/n= trace(ΦTRMPIΦRMPI)/n). Channelized receivers are the current technology for high-bandwidth systems; see Figure 2.6 for a frequency domain description.
Related literature
- Other RMPI systems
- Related systems
The “Xampling” methodology [ME10, MEE09, Eld09, MEDS11], which follows the steps of the authors' blind multiband sampling theorem [ME09a], is similar in many respects to the RMPI. However, below we describe some minor criticisms of the MWC and address some comments from [MEE09] about the RMPI. The extent to which this alias signal changes over time is determined by the “Nyquist zone” in which the carrier signal was located.
Modeling the system
- Simple models
- SPICE
- Simulink
- Calibration
- Phase blind calibration
The frequency response of the system at a frequency f is H(if), since it is just the Fourier transform of h. The simplest model of the system is as a block-diagonal matrix with ±1 entries on the non-zero portions. The problem with this procedure is that the phaseθ of the incoming signal is unknown.
General design considerations
Test signals
The performance of the system was primarily evaluated on its accuracy in reconstructing single pulses, although multiple pulses were occasionally tested. The simulations consisted of either "corner simulations" using extreme types of pulses (see Figure 2.18) or a combination of systematic parameters, such as pulse length, and random parameters, such as phase and frequency of the carrier wave. The pulse window was occasionally changed as this affects the spectral sparsity of the reconstruction.
Error metrics
This limit is always smaller than the dynamic range of a single pulse, as the large pulse introduces extra difficulties for the small pulse, such as creating extra errors due to jitter, as well as creating non-linear effects. Our demodulation method works well in multi-signal environments as the filtering discards the background signals. Errors are then recorded as a function of how far the estimated frequency is from the true frequency, |f−f0|.
Number of channels
The main disadvantage is that it requires an accurate estimate of the frequency, but it is not too sensitive to this, so we find that it works well in practice. Consider a single-pole integrator, and assume the time constant (which is the inverse of the pole location) is constant regardless of the number of channels. Because of the full reconstruction, we are able to examine the root-mean-square error (RMSE).
Chip sequence
Spectral properties of the chip sequence
- Infinite period
- Finite period
We wish to find the power spectral density of a random chip sequence of infinite length. The theorem is useful because stationary signals are not necessarily square integrable, so care must be taken when using the Fourier transform. The insight is that a time shift ofc(t) will have the same PSD (since it only changes the phase of the Fourier transform), so we will consider a random shift of the chip sequence by a sum.
Chip design considerations
- Chip sequence rate
- Chip sequence period
- Case study: test of NG chip sequence
For the left column, the energy of the ADC samples in the blue tone was .69 that of the red tone. Noise is included in the measurement; these data are a subset of the same data presented differently in Figure 2.30. Failure Criterion 1: the peak of the spectrum was not close to the actual carrier frequency.
Integration
General constraints
- Northrop Grumman integrator design
- Multipole systems
The larger the flat response of the filterH(s), the more similar the system is to being a delta function in time. This test adds noise of the form Φ(x) +z, instead of Φ(x+z), since for the latter model there is no noticeable difference in performance. The one-pole model suffers slightly, due to the decay of the time-domain transfer function(s).
Recovery
Matched filter
- Analysis versus synthesis
- Dictionary choice
- Reweighting
- Debiasing
- Non-linearity correction
- Windowing
- Further improvements
The reason we use fγ and its conjugate (ie, the negative frequency) ¯fγ is that the measurements b are real-valued. We assume that this is due to the discrete representation of the signal in post-processing. However, due to processing limitations (since the reconstruction algorithms are so far more than linear complexity), there is an upper bound on the sizeN of the discrete signal.
Results
Non-idealities
- Noise
- Jitter
- Quantization
- Cross-talk
- Clipping
- Combining non-idealities
In the Simulink model, three of the blocks have thermal noise, but by far the dominant source is in the LNA. Thermal noise has an absolute effect (constant on the top row), while jitter has a relative constant effect (constant on the bottom row) as it causes error proportional to the input amplitude. The small signal is also desensitized or even blocked due to nonlinear effects of the large signal [Raz97].
Simulation results
- Single pulse
- Two pulses
- Comparison
At higher frequencies, as shown in Figure 2.67, the results are slightly worse and ENOB is only 8.0 for 200 ns pulses. The frequency- and duration-dependent results may depend slightly on the parameters used for reconstruction, eg, Gabor dictionary parameters. The amplitude of both is -20 dBFS which is optimal according to the "sweet spot" calculations in Figure 2.61.
Hardware
- NG InP version
- Version 1
- Version 2
Without the shorts, the expected power consumption was 700 mW, the dynamic range of the chip itself was 45 to 50 dB, operating from 0.1 to 3 GHz. Dynamic range is described as a function of the measurements, and is not the same as the dynamic range achieved by reconstruction. The Simulink simulations performed indicate that the system dynamic range is particularly sensitive to the jitter performance of the system clock.
Recommendations
Contributions
We refer to this algorithm as NESTA - shorthand for Nesterov's algorithm - to acknowledge the fact that it is based on his method. This, together with the accelerated convergence rate of Nesterov's algorithm [Nes05, BT09], makes NESTA a method of choice for solving large-scale problems. Another contribution of this chapter is that it also contains a fairly wide range of numerical experiments comparing various methods on problems involving realistic and challenging data.
Organization of the chapter and notations
More specifically, Section 3.5 presents a comprehensive series of numerical experiments that illustrate the behavior of several state-of-the-art methods, including interior point methods [KKB07], projected gradient techniques [HYZ07, vdBF08, FNW07], fixed point continuation, and iterative algorithms of threshold [HYZ07, YOGD08, BT09]. Before we begin, it is best to give a brief overview of the notation used throughout the chapter. Except when the matrix A is orthogonal, this functional dependence is difficult to calculate [vdBF08].
NESTA
Nesterov’s method to minimize smooth convex functions
If we performed only the second step of the algorithm with yk−1 instead of xk, we would obtain a standard first-order technique with a convergence rate of O(1/k). The novelty is that the sequencer "takes into account" previous iterations, since step 3 includes the weighted sum of already calculated gradients2. The prox-function is usually chosen such that xcp∈ Qp, which discourages zk from moving too far from the center of xcp.
Application to compressed sensing
- NESTA
- Updating y k
- Updating z k
- Computational complexity
- Parameter selection
- Accelerating NESTA with continuation
- Some theoretical considerations
In this chapter, with the exception of §3.7, we assume that A∗A is an orthogonal projector, i.e. the rows of Aare orthonormal. The initial value of the smoothing parameter is µ0=kA∗bk`∞ and the terminal value is µf = 2σ. NESTA with continuation is applied to 10 random trials for varying numbers of continuation steps and different values of the dynamic range.
Accurate optimization
Is NESTA accurate?
Then ifx0 is sufficiently sparse and if the nonzero entries ofx0 are sufficiently large, the solutionx?to (QPλ) is given by. The absolute values of the non-zero entries of x0 are distributed between 1 and 105 so that we have about 100 dB of dynamic range. We then calculate the solution (3.4.3), and make sure it satisfies the KKT optimality conditions for (QPλ) so that the optimal solution is known.
Setting up a reference algorithm for accuracy tests
This can also be seen from Figure 3.4, which plots NESTA's solution against the optimal solution, confirming the excellent accuracy of our algorithm. The absolute values of the data supporting the optimal solution are plotted. Furthermore, Figure 3.4 shows the data from the FISTA solution versus that of the optimal solution, and one observes a very good fit (almost perfect if the size of a component of x? is greater than 3).
The smoothing parameter µ and NESTA’s accuracy
We plot the absolute values of the entries on the set where the size of the optimal solution exceeds 1. To make sure that the FISTA solution is very close to the optimal solution, we check that the KKT stationarity condition is almost verified. Specifically, notice in Table 3.2 that for this particular experiment, reducing µ by a factor of 10 gives about 1 additional digit of accuracy to the optimal value.
Numerical comparisons
- State-of-the-art methods
- NESTA
- Gradient projections for sparse reconstruction (GPSR)
- Sparse reconstruction by separable approximation (SpaRSA)
- Spectral projected gradient (SPGL1)
- Fixed point continuation method (FPC)
- FPC active set (FPC-AS)
- Bregman
- Fast iterative soft-thresholding algorithm (FISTA)
- Constrained versus unconstrained minimization
- Experimental protocol
- Numerical results
- The case of exactly sparse signals
- Approximately sparse signals
Most of the algorithms discussed in this section are considered state-of-the-art in the sense that they are the most competitive among sparse reconstruction algorithms. The solution from FISTA will also be used to assess the accuracy of other algorithms. The results are listed in Tables 3.5 (Crit.. 2); the results of using both stopping criteria are almost identical.
An all-purpose algorithm
- Non-standard sparse reconstruction: ` 1 analysis
- Numerical results for non-standard ` 1 minimization
- Total-variation minimization
- Numerical results for TV minimization
The rise time and fall time of the pulse envelope are comparable to the Doppler pulse. The diagrams below in Figure 3.7 show the spectrum of the recovered signal using analysis and synthesis, respectively. For TwIST, it is important that the number of function calls A and A* is not taken as an approximation for the computational time of the algorithm.
Handling non-projectors
Revisiting the projector case
Non-projectors for = 0 case
Non-projectors for > 0 case
Discussion
Extensions
Software
Motivation
The literature
Our approach
- Conic formulation
- Dualization
- Smoothing
- First-order methods
Contributions
Software
Organization of the chapter
Conic formulations
Alternate forms
The dual
The differentiable case
Smoothing
Composite forms
Projections
A novel algorithm for the Dantzig selector
The conic form
Smooth approximation
Implementation
Exact penalty
Alternative models
Further instantiations
A generic algorithm
The LASSO
Nuclear-norm minimization
Total-variation minimization
Combining ` 1 analysis and total-variation minimization
Implementing first-order methods
Introduction
The variants
Step size adaptation
Linear operator structure
Accelerated continuation
Strong convexity
Dual-function formulation
Background
Fenchel dual formulation
Convergence
- Convergence when f is smooth
- Convergence of inner iteration
Convergence of outer iteration
- Overall analysis
Numerical experiments
Dantzig selector: comparing first-order variants
LASSO: comparison with SPGL1
Wavelet analysis with total-variation
Matrix completion: expensive projections
Extensions
Automatic restart
Specialized solvers for certain problems
- Noiseless basis pursuit
- Conic problems in standard form
- Matrix completion problems
Software: TFOCS
Discussion
Appendix: exact penalty
Appendix: creating a synthetic test problem
Improvements to TFOCS