Algorithms of Combinatorial Cluster Analysis
3.3 FCM: Fuzzy c-means Algorithm .1 Basic Formulation.1Basic Formulation
3.3.2 Basic FCM Algorithm
The indicator (3.54), depending on the partitionU and the prototypes M, is not a convex function of its arguments. Nonetheless, the problem of its minimisation can be simplified by fixing the values of either matrixUorM, that is, by assumingU =U, M = M, respectively. In such a case the simplified functions Jm(U)= Jm(U,M) and Jm(M) = Jm(U,M)are convex functions of their arguments—consult [69, 271]. Hence, the classical optimisation methods are applicable, i.e. to determine the vector of prototypes, the equation system ofkequations of the form given below, is solved
∂
∂μj
Jα(U,M)=0, j=1, . . . ,k (3.55) and solving for the matrixUrelies on creating the Lagrange function
L(Jα,λ)= m
i=1
k j=1
uαi jxi,μj2− m
i=1
λi( k
j=1
ui j−1) (3.56)
and computing valuesui j from the equation system
⎧⎪
⎪⎪
⎨
⎪⎪
⎪⎩
∂
∂ui j
L(Jα,λ)=0
∂
∂λi
L(Jα,λ)=0
i =1, . . . ,m, j =1, . . . ,k (3.57)
Solution of the task of minimisation of function (3.54) in the set of fuzzy partitions Uf kis obtained iteratively by computing, for a fixed partitionU, the prototypes (see Appendix A)
μjl = m
i=1uαi jxil
m
l=1uαl j , j=1, . . . ,k,l=1, . . . ,n (3.58) and then new assignments to clusters
ui j =
⎧⎪
⎪⎪
⎨
⎪⎪
⎪⎩ k
l=1
xi−μj xi−μl
α−12 −1
ifZi = ∅ i j if j ∈Zi = ∅
0 if j ∈/ Zi = ∅
(3.59)
where Zi = {j: 1 ≤ j ≤ k,xi−μj =0}, and valuesi j are chosen in such a way that
j∈Zii j =1. Usually, the non-empty setZicontains one element, hence i j =1. if we substituted(xi,μj)= xi−μj +ε, whereεis a number close to zero, e.g.ε=10−10, then the above equation can be simplified to
ui j = k
l=1
d(xi,μj) d(xi,μl)
α−12 −1
(3.60) This last equation can be rewritten in an equivalent form that requires a lower number of divisions
ui j = d1−α2 (xi,μj) k
l=1d1−α2 (xi,μl) (3.61) Note that the membership degreeui j depends not only on the distances of thei-th object from the centre of the j-th cluster but also on its distance to centres of other clusters. Furthermore, whenα=2, the denominator of the expression (3.61) is the harmonic average of the squares of distance of the object from the cluster centres. In this case, the FCM resembles a little bit the KHM algorithm from Sect.3.1.5.4.
The FCM algorithm termination condition is usually defined as stabilisation of the partition matrix. IfUt,Ut+1denote the matrices obtained in subsequent iterations, then the computations are terminated as soon as maxi,j|uti j+1−uti j| ≤, whereis a predefined precision, e.g.=0.0001. Of course, one can change the order of steps, i.e. first initiate the prototype matrix M and determine the corresponding cluster assignments ui j, and subsequently update the prototypes. The last two steps are repeated till the vectors stabiliseμj, i.e. till|μtjl+1−μtjl|<, wherej=1, . . . ,k,l= 1, . . . ,n. Note that the number of comparisons required in deciding on computation termination is in the second case usually lower than in the first one: matrixMcontains knelements, while matrixUhas onlymkelements.
The fuzziness coefficientαis an important parameter of the algorithm, because its properties heavily depend on the value of this coefficient. If this value is close to one, the algorithm behaves like the classick-means algorithm. Ifαgrows without bounds, then prototypes converge to the gravity centre of the object set X. Several heuristic methods for selection of this coefficient have been proposed in the literature [64, 380]. The best recommendation is to choose it from the interval [1.5,2.5].
One has, however, to remember that the mentioned heuristics result from empirical investigations and may not reflect all the issues present in real world data.28 The paper [522] suggests some rules for the choice of the value of α, pointing at the fact that coefficient choice depends on the data themselves. The impact of parameter choice on the behaviour of the FCM algorithm was investigated by Choe and Jordan in [114]. The influence of parameterαon the number of iterations until stabilisation of the matrixUand on the distance of returned prototypesμjfrom the intrinsic ones μ∗jis depicted in Fig.3.8. The average distance has been computed as
28Dunn [166] recommendedα=2 on the grounds of the capability of the algorithm to reconstruct well separated clusters.
3.3 FCM: Fuzzyc-means Algorithm 105
davg =1 k
k j=1
μ∗j−μj (3.62)
We chose the setiris.datastemming from the repository [34] to illustrate the comparison. Intrinsic centres of the groups are presented in the table below.
M∗=
⎡
⎣5.00 3.42 1.46 0.24 5.93 2.77 4.26 1.32 6.58 2.97 5.55 2.02
⎤
⎦ (3.63)
We can deduce from the figure that the increase of the value of the coefficientα causes the increase in the number of iterations. Interestingly, forαvalues close to 1 the error measured with the quantitydavginitially decreases and then, after exceeding the optimal value (hereα∗ =1.5), it grows. The standard deviation is close to zero for lowαvalues, which means a high repeatability of the results (Fig.3.8).
Remark 3.3.3 Schwämmle and Jensen [422] investigated randomly generated data sets and concluded that the best value of theαparameter is a function of the dimension nand the amount of datam. Based on their experiments, they proposed the following analytical form of this function
α(m,n)=1+1418
m +22.05
n−2 +12.33
m +0.243
n−0.0406 ln(m)−0.1134 (3.64) This result contradicts the common practice to chooseα=2.
1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 20
25 30 35 40 45 50 55
1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 0
0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08
Fig. 3.8 Influence of the value of fuzziness exponentαon the number of iterations (picture to the left) and the quality of prototypes (picture to the right), expressed in terms of the average distance davgaccording to the Eq. (3.62). Solid line indicates the mean valuemof the respective quantity (averaged over 100 runs), and the dotted lines mark the valuesm±s, withsbeing the standard deviation.=10−8was assumed. Experiments were performed for the data setiris.txt
It has been also observed that investigation of the minimal distance between the cluster centres,VMC D, is a good hint for choice of the proper number of clustersk.
It is recommended to choose as a potential candidate such a valuek∗for which an abrupt decrease of the value ofVMC Doccurs.
One should, however, keep in mind that these results were obtained mainly for the Gaussian type of clusters. Both authors suggest great care when applying the derived
conclusions to the real data.
The FCM algorithm converges usually quickly to a stationary point. Slow con- vergence should be rather treated as an indication of a poor starting point. Hathaway and Bezdek cite in [239, p. 243] experimental results on splitting a mixture of normal distributions into constituent sets. An FCM algorithm with parameterα=2 needed 10–20 iterations to complete the task, while the EM algorithm required hundreds, and in some cases thousands of iterations.
Encouraged by this discovery, the authors posed the question: Letp(y;α0,α1)= α0p0(y)+α1p1(y), where p0i p1are symmetric density functions with mean val- ues 0 and 1, respectively, and the expected values (with respect to components) of the variable|Y|2being finite. Can these clusters be identified? Regrettably, it turns out that (forα=2) there exist such values ofα0,α1that the FCM algorithm erro- neously identifies the means of both sub-populations. This implies that even if one could observe an infinite number of objects the algorithm possesses only finite pre- cision (with respect to estimation of prototype location). This result is far from being a surprise. FCM is an example of non-parametric algorithm and the quality indexJα does not refer to any statistical properties of the population. Hathaway and Bezdek conclude [239]: if population components (components of a mixture of distributions) are sufficiently separable, i.e. each sub-population is related to a clear “peak” in the density function, then the FCM algorithm can be expected to identify the charac- teristic prototypes at least as well as the maximum likelihood method (and for sure much quicker than this).
An initial analysis of the convergence of the FCM algorithm was presented in the paper [69], and a correct version of the proof of convergence was presented 6 years later in the paper [243]. A careful analysis of the properties of this algorithm was initialised by Ismail and Selim [271]. This research direction was pursued, among others, by Kim, Bezdek and Hathaway [290], Wei and Mendel [500], and also Yu and Yang [523].