• Tidak ada hasil yang ditemukan

Geometric interpretation of NMF, its solution space, and a trans-

While in 2-dimensions it is trivial to find the tightest simplicial cone and char- acterize the space of admissible solutions, it is not so straightforward to do so in higher dimensions. For example, the simplicial cone may also be defined by points that are two-sparse (lie on hyperplanes defined by pairs of axes) instead of one-sparse. On the other hand, the loosest possible simplicial cone is always defined by the axes, as a simplicial cone cannot be larger than the positive or- thant (else the basis vectors will have negative entries).

If somehow the data were to fill the positive orthant such that the tightest sim- plicial cone was pushing close to the loosest, then the NMF solution will be unique, and the axes themselves will be the basis as shown on the right of Fig- ure7.1. A detailed mathematical treatment of uniqueness of NMF solutions can be found in [91], but the gist of it is that uniqueness of NMF solution requires

Chapter 5. Reducing the solution space of Non-negative matrix Factorization 91 the data matrix to be sparse. Our main insight is to realize that uniqueness of solution can be achieved by translating the goal to finding a transform such that the transformed data fills its positive orthant. A simplicial cone containing such data and representing an NMF solution has no space to grow. It is straightfor- ward to conclude that a sufficient condition for this to happen is if every axis has at least one data point on it as illustrated on the right of Figure 7.1 with transformed basisd1 andd2 obtained undersparsifyingtransformD. Such data points are sparse in their mixing coefficients as the contribution of other axes to their reconstruction is precisely zero, as suggested by Gillis [91].

Endmember spectra in hyperspectral imaging are highly correlated and are not sparse in the case of reflectance data. Assuming a dictionary allowing to give a non-negative sparse representation of each endmember, each linear mixture of these endmembers admits a non-negative sparse representation in this dictio- nary with a support which is the union of the supports of all the endmembers.

The identification of this support for the mixture data can then be formulated as a sparse representation in the multiple measurement vector (MMV) frame- work [100] as,

X =FX.˜ (7.1)

Each column vectorxnof the data matrixX represents the spectrum of then-th image pixel inLspectral bands;F ∈RL×Qrepresents the dictionary containing Q non-negative atoms. All elements of the vectorx˜n ∈ RQ×1 (n-th column of matrixX˜) are either zero or non-negative to maintain physical meaning of the reconstructed spectra. Moreover, in the MMV framework all the signals are decomposed jointly by selecting the same atoms (support). The problem lies in defining the dictionaryF and proposing an algorithm for the estimation of the non-negative sparse data matrixX˜.

Chapter 5. Reducing the solution space of Non-negative matrix Factorization 93

7.3.1 Joint non-negative sparse representation

The goal is to minimize an objective function that combines data fidelity and a sparsity-inducing penalty. A common choice is to use a regularized least- squares objective function

Jλ(F,X) =˜ ||X−FX˜||22+λ(ΣQq=1qk)q∈Ω

∀k ( ˜Xqk)k∈Ω≥0 (7.2)

where (Ω = 1,2. . . N) represents the index set of N observation signals. The problem in equation (7.2) is known as sparse representation of a multiple mea- surement vector. A recent work on theory and algorithms for solving MMV problem satisfying joint sparsity constraints can be found in [101] and the ref- erences therein.

In order to solve the MMV problem under non-negativity constraints we adopt here an approach based on a greedy procedure whose main steps are sum- marized in Algorithm 5. To accomplish steps 1 and 3 of Algorithm 5 with a reduced numerical cost, we propose to use orthogonal matching pursuit algo- rithm (OMP) [102] for atom selection and non-negativity constrained interior point least-squares method [103] for the estimation of the decomposition coef- ficients.

Algorithm 5Joint Non-negative Sparse Representation Input: Data matrixX ∈RL×N; dictionaryF ∈RL×Q Output: Sparse decomposition matrixX˜ ∈RQ×N

1: Extract the common support set ofX inF and noteKits cardinality.

2: Form a sub-dictionaryFˆ containing only the K atoms associated to the common support.

3: Solve the non-negativity constrained least-squares problem of equation (7.2) with Fˆ (instead ofF).

4: Form the sparseL×Qdecomposition matrixX˜ by making the coefficient values at indices in the common support set equal to those obtained in step3and rest equal to zero.

5: Return the sparse decomposition matrixL×Qdata matrixX.˜

Chapter 5. Reducing the solution space of Non-negative matrix Factorization 94

7.3.2 Dictionary learning

The dictionary on which the sparse representation is performed can be set us- ing two strategies: (i) Analytical dictionary: each atom in the dictionary is chosen as a Gaussian pattern with varying mean and variance. (ii)Adaptive dictionary: the dictionary is obtained by using specialized dictionary learning algorithms taking into account the positivity constraint. For this chapter, the dictionary was provided by non-negative sparse coding algorithm [104]; taking all the spectral signatures from the USGS (U.S. Geological Survey) library (500 spectra in 224 spectral bands)1[105] as data matrixX.

Few elements of the dictionaries obtained by both methods are shown in Fig- ure7.2. It can be easily seen that the atoms of an adaptive dictionary can give better fitting of the hyperspectral signals using the sparse recovery Algorithm5.

Hence, we conclude that adaptive dictionaries are better for a joint sparse rep- resentation of the hyperspectral signals.

0 10 20 30

10−4 10−2 100

support cardinality

relative residual norm

Error

Gaussian Learned

0 100 200

0 0.05 0.1

sample index

amplitude

Gaussian

0 100 200

0 0.005 0.01

sample index

amplitude

Learned