• Tidak ada hasil yang ditemukan

Chapter 3. Qualitative Properties of the Minimum Sum-of-Squares

3.4 Characterizations of the Local Solutions

to the value given by the centroid system in (d)).

Proposition 3.4 One has

Ji(x) = j ∈ J | ai ∈ A[xj] . (3.17) Proof. From the formula of hi,j(x) it follows that

hi,j(x) = X

q∈J

kai−xqk2− kai−xjk2. Therefore, by (3.15) we have

ϕi(x) = max

j∈J

h X

q∈J

kai−xqk2− kai−xjk2i

= X

q∈J

kai−xqk2+ max

j∈J

− kai −xjk2

= X

q∈J

kai−xqk2−min

j∈J kai−xjk2.

Thus, the maximum in (3.15) is attained when the minimum min

j∈J kai−xjk2 is achieved. So, by (3.16),

Ji(x) =nj ∈ J | kai−xjk= min

q∈J kai−xqko.

This implies (3.17). 2

Invoking the subdifferential formula for the maximum function (see [20, Proposition 2.3.12] and note that the Clarke generalized gradient coincides with the subdifferential of convex analysis if the functions in question are convex), we have

∂ϕi(x) = co{∇hi,j(x)|j ∈ Ji(x)} = co2 xejeai,j | j ∈ Ji(x) , (3.18) where

xej = x1, . . . , xj−1,0Rn, xj+1, . . . , xk (3.19) and

eai,j = ai, . . . , ai, 0Rn

j−th position|{z}

, ai, . . . , ai. (3.20) By the Moreau-Rockafellar theorem [84, Theorem 23.8], one has

∂f2(x) = 1 m

X

i∈I

∂ϕi(x) (3.21)

with ∂ϕi(x) being computed by (4.50).

Now, suppose x = (x1, ..., xk) ∈ Rnk is a local solution of (3.2). By the necessary optimality condition in DC programming (see, e.g., [31] and [77]), which can be considered as a consequence of the optimality condition obtained by Dem’yanov et al. in quasidifferential calculus (see, e.g., [25, Theorem 3.1]

and [26, Theorem 5.1]), we have

∂f2(x) ⊂ ∂f1(x). (3.22)

Since ∂f1(x) is a singleton, ∂f2(x) must be a singleton too. This happens if and only if ∂ϕi(x) is a singleton for every i ∈ I. By (4.50), if |Ji(x)| = 1, then

|∂ϕi(x)| = 1. In the case where |Ji(x)| > 1, we can select two elements j1

and j2 from Ji(x), j1 < j2. As ∂ϕi(x) is a singleton, by (4.50) one must have xej1eai,j1 = xej2eai,j2. Using (3.19) and (3.20), one sees that the latter occurs if and only if xj1 = xj2 = ai. To proceed furthermore, we need to introduce the following condition on the local solution x.

(C1) The components of x are pairwise distinct, i.e., xj1 6= xj2 whenever j2 6= j1.

Definition 3.2 A local solution x = (x1, ..., xk) of (3.2) that satisfies (C1) is called a nontrivial local solution.

Remark 3.5 Proposition 3.3 shows that every global solution of (3.2) is a nontrivial local solution.

The following fundamental facts have the origin in [71, pp. 346]. Here, a more precise and complete formulation is presented. In accordance with (3.17), the first assertion of the next theorem means that if x is a nontrivial local solution, then for each data point ai ∈ A there is a unique component xj of x such that ai ∈ A[xj].

Theorem 3.3 (Necessary conditions for nontrivial local optimality) Suppose that x = (x1, ..., xk) is a nontrivial local solution of (3.2). Then, for any i ∈ I, |Ji(x)| = 1. Moreover, for every j ∈ J such that the attraction set A[xj] of xj is nonempty, one has

xj = 1

|I(j)|

X

i∈I(j)

ai, (3.23)

where I(j) ={i ∈ I | ai ∈ A[xj]}. For any j ∈ J with A[xj] = ∅, one has

xj ∈ A[x],/ (3.24)

where A[x] is the union of the balls B(a¯ p,kap − xqk) with p ∈ I, q ∈ J satisfying p∈ I(q).

Proof. Suppose x = (x1, ..., xk) is a nontrivial local solution of (3.2). Given any i ∈ I, we must have |Ji(x)| = 1. Indeed, if |Ji(x)| > 1 then, by the analysis given before the formulation of the theorem, there exist indexes j1 and j2 from Ji(x) such that xj1 = xj2 = ai. This contradicts the nontriviality of the local solution x. Let Ji(x) = {j(i)} for i ∈ I, i.e., j(i) ∈ J is the unique element of Ji(x).

For each i ∈ I, observe by (3.15) that

hi,j(x) < hi,j(i)(x) = ϕi(x) ∀j ∈ J \ {j(i)}.

Hence, by the continuity of the functions hi,j(x), there exists an open neigh- borhood Ui of x such that

hi,j(y) < hi,j(i)(y) ∀j ∈ J \ {j(i)}, ∀y ∈ Ui. It follows that

ϕi(y) =hi,j(i)(y) ∀y ∈ Ui. (3.25) So, ϕi(.) is continuously differentiable on Ui. Put U = \

i∈I

Ui. From (3.14) and (3.25) one can deduce that

f2(y) = 1 m

X

i∈I

ϕi(y) = 1 m

X

i∈I

hi,j(i)(y) ∀y ∈ U.

Therefore, f2(y) is continuously differentiable function on U. Moreover, the formulas (4.50)–(3.20) yield

∇f2(y) = 2 m

X

i∈I

(yej(i)eai,j(i)) ∀y ∈ U, (3.26) where

yej(i) = y1, ..., yj(i)−1,0Rn, yj(i)+1, ..., yk and

eai,j(i) = ai, . . . , ai, 0Rn

|{z}

j(i)−th position

, ai, . . . , ai.

Substitutingy = xinto (3.26) and combining the result with (3.22), we obtain X

i∈I

(xej(i)eai,j(i)) =m(x1−a0, ..., xk −a0). (3.27)

Now, fix an index j ∈ J with A[xj] 6= ∅ and transform the left-hand side of (3.27) as follows:

X

i∈I

(xej(i)eai,j(i)) = X

i∈I, j(i)=j

(xej(i)eai,j(i)) + X

i∈I, j(i)6=j

(xej(i)eai,j(i))

= X

i∈I, j(i)=j

(xej(i)eai,j(i)) + X

i /∈I(j)

(xej(i)eai,j(i)).

Clearly, if j(i) = j, then the j-th component of the vector xej(i)eai,j(i), that belongs to Rnk, is 0Rn. If j(i) 6= j, then the j-th component of the vector xej(i)eai,j(i) is xj −ai. Consequently, (3.27) gives us

X

i /∈I(j)

(xj −ai) =m(xj −a0).

Since ma0 = a1+· · ·+am, this yields X

i∈I(j)

ai = |I(j)|xj. Thus, formula (3.23) is valid for any j ∈ J satisfying A[xj] 6= ∅.

For any j ∈ J with A[xj] = ∅, one has (3.24). Indeed, suppose to the contrary that there exits j0 ∈ J with A[xj0] = ∅ such that for some p ∈ I, q ∈ J, one has p ∈ I(q) and xj0 ∈ B¯(ap,kap−xqk). If kap−xj0k = kap−xqk, then Jp(x) ⊃ {q, j0}. This is impossible due to the first claim of the theorem.

Now, if kap − xj0k < kap − xqk, then p /∈ I(q). We have thus arrived at a contradiction.

The proof is complete. 2

Roughly speaking, the necessary optimality condition given in the above theorem is a sufficient one. Therefore, in combination with Theorem 3.3, the next statement gives a complete description of the nontrivial local solutions of (3.2).

Theorem 3.4 (Sufficient conditions for nontrivial local optimality) Suppose that a vector x = (x1, ..., xk) ∈ Rnk satisfies condition (C1) and |Ji(x)| = 1 for every i ∈ I. If (3.23) is valid for any j ∈ J with A[xj] 6= ∅ and (3.24) is fulfilled for any j ∈ J with A[xj] = ∅, then x is a nontrivial local solution of (3.2).

Proof. Let x = (x1, ..., xk) ∈ Rnk be such that (C1) holds, Ji(x) = {j(i)}

for every i ∈ I, (3.23) is valid for any j ∈ J with A[xj] 6= ∅, and (3.24) is satisfied for any j ∈ J with A[xj] = ∅. Then, for all i ∈ I and j0 ∈ J \ {j(i)},

one has

kai−xj(i)k < kai−xj0k.

So, there exist ε > 0, q ∈ J, such that

kai −xej(i)k < kaiexj0k ∀i ∈ I, ∀j0 ∈ J \ {j(i)}, (3.28) whenever vector xe = (xe1, ...,xek) ∈ Rnk satisfies the conditionkxeq−xqk < ε for all q ∈ J. By (3.24) and by the compactness of A[x], reducing the positive number ε (if necessary) we have

xej ∈ A[/ x]e (3.29)

whenever vector xe = (xe1, ...,xek) ∈ Rnk satisfies the condition kxeq − xqk < ε for all q ∈ J, where A[x] is the union of the balls ¯e B(ap,kap−xeqk) with p∈ I, q ∈ J satisfying p ∈ I(q) = {i ∈ I | ai ∈ A[xq]}.

Fix any vector xe = (xe1, ...,xek) ∈ Rnk with the property that kxeq −xqk < ε for all q ∈ J. Then, by (3.28) and (3.29), Ji(x) =e {j(i)}. So,

minj∈J kai −xejk2 = kai−xej(i)k2. Therefore, one has

f(x) =e 1 m

X

i∈I

minj∈J kai−xejk2

= 1 m

X

i∈I

kai −xej(i)k2

= 1 m

X

j∈J

X

i∈I(j)

kaiexj(i)k2

= 1 m

X

j∈J

X

i∈I(j)

kaiexjk2

≥ 1 m

X

j∈J

X

i∈I(j)

kai−xjk2 = f(x), where the inequality is valid because (3.23) obviously yields

X

i∈I(j)

kai−xjk2X

i∈I(j)

kai−xejk2

for every j ∈ J such that the attraction set A[xj] of xj is nonempty. (Note that xj is the barycenter of A[xj].)

The local optimality of x = (x1, ..., xk) has been proved. Hence, x is a

nontrivial local solution of (3.2). 2

Example 3.2 (A local solution need not be a global solution) Consider the clustering problem described in Example 3.1. Here, we have I = {1,2,3}and J = {1,2}. By Theorem 3.1, problem (3.2) has a global solution. Moreover, if x = (x1, x2) ∈ R2×2 is a global solution then, for everyj ∈ J, the attraction set A[xj] is nonempty. Thanks to Remark 3.5, we know that x is a nontrivial local solution. So, by Theorem 3.3, the attraction sets A[x1] and A[x2] are disjoint. Moreover, the barycenter of each one of these sets can be computed by formula (3.23). Clearly, A = A[x1]∪A[x2]. Since A[xj] ⊂ A= {a1, a2, a3}, allowing permutations of the components of each vector x = (x1, x2) ∈ R2×2 (see Remark 3.4), we can assert that the global solution set of our problem is contained in the set

nx¯:= (1 2,1

2), (0,0), xˆ:= (0,1

2), (1,0), xe := (1

2,0),(0,1)o. (3.30) Since f(¯x) = 13 and f(ˆx) = f(x) =e 16, we infer that ˆx and xe are global solutions of our problem. Using Theorem 3.4, we can assert that ¯x is a local solution. Thus, ¯x is a local solution which does not belong to the global solution set, i.e., ¯x is a local-nonglobal solution of our problem.

Example 3.3 (Complete description of the set of nontrivial local solutions) Again, consider the MSSC problem given in Example 3.1. Allowing permu- tations of the components of each vector in R2×2, by Theorems 3.3 and 3.4 we find that the set of nontrivial local solutions consists of the three vectors described in (3.30) and all the vectors of the form x = (x1, x2) ∈ R2×2, where x1 = (13,13) and

x2 ∈/ B(a¯ 1,ka1−x1k)∪B(a¯ 2,ka2−x1k)∪B(a¯ 3,ka3 −x1k).

This set of nontrivial local solutions is unbounded and non-closed.

Example 3.4 (Convergence analysis of the k-means algorithm) Consider once again the problem described in Example 3.1. By the results given in Ex- ample 3.3, the centroid systems in items (a), (b), (c) and (d) of Example 3.1 are local solutions. In addition, by Example 3.2, the centroid systems in the just mentioned items (a) and (c) are global solutions. Concerning the centroid system in item (e) of Example 3.1, remark that x := (13,13),(1 +

5

3 ,0) is not a local solution by Theorem 3.3, because a2 ∈ A[x1]∩A[x2], i.e.,J2(x) ={1,2}

(see Figure 3.1). In general, with x1 = (13,13) and x2R2×2 belonging to the boundary of the set

B(a¯ 1,ka1−x1k)∪B¯(a2,ka2−x1k)∪B(a¯ 3,ka3 −x1k),

x := (x1, x2) is not a local solution of the MSSC problem under consideration.

The above analysis shows that the k-means algorithm is very sensitive to the choice of starting centroids. The algorithm may give a global solution, a local-nonglobal solution, as well as a centroid system which is not a local solution. In other words, the quality of the obtained result greatly depends on the initial centroid system.

Figure 3.1: The centroids in item (e) of Example 3.1