Sparse solutions of a linear system of equations

2.8 Performance evaluation metric

3.1.2 Sparse solutions of a linear system of equations

Given a target signal y∈R^m and a full rank matrix A∈R^m×n with m < n, the underdetermined linear system of equation is defined as y =Ax. The underdetermined or overcomplete-basis system has lesser number of equations than the number of unknown variables, which results in infinitely many solutions. In order to narrow the choice to one well-defined solution, additional criteria are needed.

A familiar way to do this is regularization, where a functionJ(x) is introduced to govern the kind of solution(s). The general optimization problem is defined as

minx J(x) subject to y=Ax. (3.4)

The unique solution of the Equation 3.4 is guaranteed by selecting a strictly convex function J(.).

Considering J(x) =kxk²2, the squared Euclidean norm, the optimization problem is given by

minx kxk²2 subject to y=Ax. (3.5)

The unique solution of the Equation 3.5 is given by ˆ

x=A^†x=A^′(AA^′)⁻¹y. (3.6)

where A^† and A^′ are the pseudo-inverse and transpose of the matrix A, respectively. It is to be

noted that the matrixA is assumed to be full rank, and hence the matrixAA^′ is positive definite and thus invertible. The solution ˆxobtained using l₂-norm minimization is not the sparse solution.

ConsideringJ(x) =kxk0, the optimization problem for solving the sparsest solution (sparse code) ˆ

xis formulated as

minx kxk0 subject to y=Ax. (3.7)

This is a non-convex combinatorial optimization problem, and it has proven that obtaining an exact solution to the Equation 3.7 is, in general NP hard [170]. However, there exist a number of pursuit algorithms to find the approximate solution. One of the popular sparse coding algorithm among those is OMP [168]. The OMP is an iterative greedy algorithm and at each step it selects a dictionary atom that is most correlated with the current residual. In order to obtain the approximate solution, the optimization problem given in Equation 3.7 is reformulated as

minx kxk0 subject to ky−Axk²2 ≤ǫ (3.8) or

minx ky−Axk²2 subject to kxk0 ≤γ (3.9) or

minx

2ky−Axk²2+λ₀kxk0 (3.10)

where ǫ is the error bound, γ and λ₀ are the sparsity constraint (i.e., controls the number of non- zero elements). The solution xˆ is the sparse representation of the target signal/vector y. The sparse representation refers to the representation of a target signal using a linear combination of only few dictionary atoms.

Another well known pursuit algorithm used for computing sparse representation is basis pursuit (BP) [171]. In BP, the optimization problem is obtained by setting the function J(x) =kxk1. It uses l₁-norm constraint instead ofl₀-norm which makes the optimization problem convex. The optimization problem based onl₁-norm minimization is formulated as

minx kxk1 subject to y=Ax. (3.11)

This problem can be solved using general purpose solvers such as simplex and interior point methods [172] in orderO(n³)which is slow for large scale problems. If datayis noisy, the optimization problem

3.1 Role of lp-norm in sparse representation

given in Equation 3.11 is reformulated as

minx kxk1 subject to ky−Axk²2 ≤ǫ (3.12) or

minx

2ky−Axk²2+λ1kxk1 (3.13)

where λ₁ is a regularization parameter that controls the trade-off between sparsity and reconstruction fidelity. The problem mentioned in Equation 3.13 is called basis pursuit denoising (BPDN) in signal processing domain. In the statistical community, the optimization problem related to Equation 3.13 is well known as Lasso [173] and formulated as

minx ky−Axk²2 subject to kxk1 ≤q (3.14) whereqis thel₁-norm constraint on variables. The matrixA∈R^m×nis implicitly assumed to havem >

n, i.e. representing an overdetermined linear system. The Lasso is an efficient regularization method for estimating the sparse solution in linear models. It regularizes or shrinks a fitted model through an l₁penalty or constraint. The problem mentioned in Equation 3.13 is equivalent to Equation 3.14 under an appropriate correspondence of parameters. If x˜ is a solution to Equation 3.13 for some λ₁ ≥ 0, it also solves Equation 3.14 for q = k˜xk1 [174]. The Lasso is not robust when predictors (dictionary atoms) are highly correlated. It will arbitrarily choose one and ignore the others. The Lasso selects not more thanm variables ifn≫m. To overcome these limitations, there exist an ENet [175] regularized regression method which linearly combines l₁ and l₂ penalties of the Lasso and the ridge regression (RR), and is given as

minx

2ky−Axk²2+λ₁kxk1+λ₂

2 kxk²2 (3.15)

whereλ₁ andλ₂ are the positive regularization coefficients. Thel₁ penalty term of Equation 3.15 does the variable selection while thel₂ part does the grouped selection and stabilizes the solution paths with respect to random sampling, thereby improving prediction [175]. In ENet, a group of highly correlated variables tend to have coefficients of same magnitude. When n ≫ m, the ENet select more than m variables unlike the Lasso. The ENet problem simplifies to RR when λ₁ = 0 and to the Lasso when λ₂ = 0. The Equation (3.15) can be transformed into an equivalent Lasso problem as

x= arg min

2ky^∗−A^∗xk²2+λ₁kxk1 (3.16)

where

y^∗=





 y 0







(m+n)×1

and A^∗=





 A

√λ₂I







(m+n)×n

(3.17)

Note that the dimension of y^∗ is (m+n) and A^∗ has a rank of n. This means that in all situations the elastic net can selects all n variables, and thus overcome the limitation of Lasso. The details of sparse coding algorithms such as OMP and a modified version of LAR referred as LARS are provided in Appendix A. The Lasso-based regularization problem can be efficiently solved using the LARS algorithm. The same algorithm can be used to solve the ENet-based regularization problem after transforming the ENet-based problem into Lasso-based one.

The success of any sparse coding algorithm depends on the choice of matrix or dictionary A.

In general, the dictionaries are either parametric (analytic) or data-driven. The examples of the analytic dictionary include Fourier [176], wavelet [177] , curvelet [178] and contourlet [179] transforms where a pre-specified set of mathematical functions is employed to represent the target data. The resulting dictionary usually leads to simple and fast algorithms which do not involve multiplication by the dictionary matrix for computing the sparse representation [176]. The model functions used in the analytic dictionaries are limited and over-simplistic compared to the complexity of many natural signals which is a major drawback of the analytic dictionary. On the other hand, data-driven dictionaries are derived from the training data (i.e., exemplar dictionary) itself or using a learning algorithm. The method of optimal directions (MOD) [180], K-SVD [160], D-KSVD [165], LC-KSVD [166] and OL dictionary [181] are few examples of dictionary learning algorithms. In the subsequent sections, we have explored the use of sparse representation for LR task, considering exemplar and various learned dictionaries like K-SVD, D-KSVD, LC-KSVD, and OL. The steps involved in learning these dictionaries are provided in Appendix B. The proposed sparse representation based LR systems are contrasted with the i-vector based LR systems employing various classifiers like CDS, SVM, MLR, G-PLDA, and GG.

Dalam dokumen EXPLORATION OF SPARSE REPRESENTATION TECHNIQUES IN LANGUAGE RECOGNITION (Halaman 73-76)