Preliminaries - Source Identification for Mixtures of Products

Chapter IV: Source Identification for Mixtures of Products

4.2 Preliminaries

mixture in whichPr(X= 1 |U = 0) = 1and the otherPr(X = 1|U = 1) = 0. More specifically, it is only possible to identify themeanof these two parameters. With access to multiple samples from thesamesource, however, we can begin to infer the parameters associated with each mixture component.

For thek-MixIID problem for binary rv’s, it is known thatn = 2k−1such samples [9] are needed. (This value is called there the threshold “aperture”

of the problem.)

Unlikek-MixIID, thek-MixProd problem does not allow us access to multiple samples of iid variables from a source. However it does give access to multiple variables which are independent conditional on the source. An approach introduced in [44] is to create synthetic copies of a single variable via linear combinations of the other variables; since we need to modify the method, we describe below how this is done. This approach enables a reduction from the k-MixProd problem to thek-MixIID problem.

Organization. Section4.2sets up the key mathematical objects needed for our work. Section4.3describes the algorithm. The algorithm is similar to that in [44], differing in one crucial way which will be described. Along with the algorithm pointers are provided to the main steps of the analysis in Appendix4.5. The part of this work which is a full departure from the previous literature is the condition number bound in Section4.4, in which the condition number of a key linear mapping (namely, the matrix of multilinear moments of the distribution on observables), is bounded through a novel argument characterizing the images of rank-1tensors under mapping by

is given byu⊙v := (u1v₁, . . . , u_kv_k). Equivalently, using the linear operator v_⊙ = diag(v), the Hadamard product isu⊙v =u·v_⊙. The identity element for the Hadamard product is the all-ones vector1. The Hadamard product ofuwith itselfktimes is writtenu^⊙k.

Definition 57(Hadamard extension). Forn∈R^[n]×[p], theHadamard extension ofn, writtenH(n), is the2ⁿ×pmatrix with rowsH(n)_Sfor allS⊆[n], where, forS ={i₁, . . . , i_ℓ},H(n)_S =n_i₁⊙ · · · ⊙n_i_ℓ; equivalentlyH(n)^j_S =Q

i∈Sn^j_i. In particularH(n)∅ =1, and for alli∈[n],H(n){i} =n_i.

Notation for subsets and collections of subsets We will reserve calligraphic fonts for collections of subsets of [n], i.e., S ⊆ 2^[n]. In contrast, the same variable in a standard math font will represent a subset of[n], i.e.,S ⊆[n]. Definition 58(Sum of two collections). Thesum of two collectionsof subsets X,Y ⊆ 2^[n]is defined asX +Y :={X∪Y :X ∈ X, Y ∈ Y}.

Definition 59(Kronecker product of vectors). LetS, T ⊂[n]be two disjoint sets. Letxandy, respectively, be vectors indexed by the subsets inX = 2^S and Y = 2^T, respectively. Then the Kronecker product x⊗ y is the vector indexed by the subsets inX +Y given by

(x⊗y)X∪Y :=x_Xy_Y, for allX ∈ X andY ∈ Y.

Note thatX +Y = 2^S∪T. (Notice that every setZ ∈ X +Y can be written uniquely asZ =X∪Y forX ∈ X andY ∈ Y. We don’t define the Kronecker product whenX ∩ Y ̸=∅.)

Definition 60 (Kronecker product of vectors — alternate definition). Let S, T ⊂ [n]be two disjoint sets. Letx ∈ R²^S andy ∈ R²^T, which is to say,x has a coordinatex_Rfor everyR⊆S, and similarly fory. Then theKronecker productx⊗yis the vector with coordinatesx_Rfor everyR⊆S∪T, given by

(x⊗y)_R :=xR∩SyR∩T.

In the algorithm and analysis we will make heavy use of the singular values of a matrixM, which we will denoteσ_max(M) =σ₁(M)≥σ₂(M)≥ · · ·.

When

v⁽ⁱ⁾ ^k_i=1 is a set of row vectors, we will write(v⁽¹⁾;v⁽²⁾;. . . ;v^(k))for the matrix obtained by vertically concatenating each row vector in order.

Finally, for a matrixM,M⁺will denote the Moore-Penrose inverse (i.e., the pseudoinverse), given by(M^TM)⁻¹M^Tin the case thatMhas full column rank.

Definition 61(Column-wise Khatri-Rao product of matrices). Thecolumn- wise Khatri-Rao productof matricesA ∈ R^n×k,B ∈ R^m×k is denotedA∗B; A∗Bhas dimensionsnm×kwith rowijgiven byA_i∗⊙B_j∗.

Definition 62(Vandermonde matrix). The Vandermonde matrixVdm(m)∈ R^k×k associated with a vector m ∈ R^k has entriesVdm(m)ji = m^j_i for j ∈ {0,1, . . . , k−1}andi∈ {1,2, . . . , k}.

Multilinear Moments

Using the notation from Sec.4.1, observe that the expectation of eachX_i is given bym_iπ^T. The model parameters are(π_j)_j∈[k]and(m_ij)i∈[n],j∈[k](up to a permutation of the range ofU).

When k > 1, observable variables X_i and X_i^′ will not in general be independent. We will make use of information about the dependencies between observables through measurements ofE[X_iX_i^′]. Generalizing this, for any S ⊆[n], we will define the random variableXS :=Q

i∈SXi. The data we ob- tain from our samples will be estimates ofE[XS]for different subsetsS ⊆[n]. These are known as themultilinear momentsof the distribution, since they are multilinear polynomials in the parametersm_ij.

Our goal will be to eventually compute power moments for the single variable X₁, i.e. the valuesE[X₁X₁⁽²⁾X₁⁽³⁾· · ·X₁^(ℓ)]forℓ≤2kwhere eachX₁^(j)is a copy ofX₁ that is iid conditioned onU. Such copies are not guaranteed to exist among theX_i, but we can still define (and our algorithm will compute) the power moments. Theℓth power moment is defined asE[X₁^⊙ℓ], the expectation of the product ofℓcopies ofX₁that are iid conditioned onU.

Now E[X_S] = P

jπ_jQ

i∈Sm_ij, or equivalently, E[X_S] = (m_i₁ ⊙m_i₂ ⊙ · · · ⊙ m_i_s)π^TwhereS ={i₁, i₂, . . . , i_s}.

If we replacemwith the Hadamard extensionM:=H(m), we can simplify the above equation toE[X_S] =M_Sπ^T.

We can also write power moments in terms of Hadamard products. In partic- ular, we haveE[X_i^⊙ℓ] =P

jπ_jm^ℓ_ij =m^⊙ℓ_i π^T.

Observe that source identification is not possible if M has less than full column rank, i.e.,rankM< k, as then the mixing weights cannot be unique.

For a collection of subsetsS ⊆ 2^[n], letM[S]denote the restriction ofMto the rowsM_S, S ∈ S. E.g.,M=M[2^[n]].

If we have non-overlapping collections of subsetsAandB, we can build a matrix of multilinear moments as follows:

Definition 63(Observable matrices). For collectionsA,B ⊆2^[n]withA ⊆2^S andB ⊆2^T for disjointS, T ⊆[n], we define the matrix

CB,A:= M[B]π⊙M[A]^T. For anyA∈ A, B ∈ B, we have

(CB,A)_B,A =M_Bπ⊙M^T_A

which is the multilinear momentE[XA∪B]and therefore observable.

The Empirical Multi-linear Moments For a finite sample drawn from the model, we let ˜g(S)be the empirical estimate of E[X_S], i.e., the fraction of samples for whichQ

i∈SX_i = 1. Theseg(S)˜ forS ⊆[n]are the complete list of “observables” of the model. Each converges, in the infinite-sample limit, to the valueg(S) :=E[X_S],

g(S) = MSπ^T=MSπ⊙1^T.

Our algorithm will heavily utilize (the empirical estimates of) matrices of observable moments.

Dalam dokumen The Identification of Discrete Mixture Models (Halaman 53-56)