Measures of Quality of Fuzzy Partition - FCM: Fuzzy c-means Algorithm .1 Basic Formulation.1Bas

Algorithms of Combinatorial Cluster Analysis

3.3 FCM: Fuzzy c-means Algorithm .1 Basic Formulation.1Basic Formulation

3.3.3 Measures of Quality of Fuzzy Partition

It has been also observed that investigation of the minimal distance between the cluster centres,VMC D, is a good hint for choice of the proper number of clustersk.

It is recommended to choose as a potential candidate such a valuek^∗for which an abrupt decrease of the value ofVMC Doccurs.

One should, however, keep in mind that these results were obtained mainly for the Gaussian type of clusters. Both authors suggest great care when applying the derived

conclusions to the real data.

The FCM algorithm converges usually quickly to a stationary point. Slow convergence should be rather treated as an indication of a poor starting point. Hathaway and Bezdek cite in [239, p. 243] experimental results on splitting a mixture of normal distributions into constituent sets. An FCM algorithm with parameterα=2 needed 10–20 iterations to complete the task, while the EM algorithm required hundreds, and in some cases thousands of iterations.

Encouraged by this discovery, the authors posed the question: Letp(y;α0,α1)= α0p₀(y)+α1p₁(y), where p₀i p₁are symmetric density functions with mean values 0 and 1, respectively, and the expected values (with respect to components) of the variable|Y|²being finite. Can these clusters be identified? Regrettably, it turns out that (forα=2) there exist such values ofα0,α1that the FCM algorithm erro- neously identifies the means of both sub-populations. This implies that even if one could observe an infinite number of objects the algorithm possesses only finite precision (with respect to estimation of prototype location). This result is far from being a surprise. FCM is an example of non-parametric algorithm and the quality indexJ_α does not refer to any statistical properties of the population. Hathaway and Bezdek conclude [239]: if population components (components of a mixture of distributions) are sufficiently separable, i.e. each sub-population is related to a clear “peak” in the density function, then the FCM algorithm can be expected to identify the charac- teristic prototypes at least as well as the maximum likelihood method (and for sure much quicker than this).

An initial analysis of the convergence of the FCM algorithm was presented in the paper [69], and a correct version of the proof of convergence was presented 6 years later in the paper [243]. A careful analysis of the properties of this algorithm was initialised by Ismail and Selim [271]. This research direction was pursued, among others, by Kim, Bezdek and Hathaway [290], Wei and Mendel [500], and also Yu and Yang [523].

3.3 FCM: Fuzzyc-means Algorithm 107

Chap.4. Nonetheless, we will present here a couple of measures that play a special role in the evaluation of fuzzy clustering algorithms.

If the intrinsic group membership of objects is known,P^t = {C₁^t, . . . ,C_k^t}, then the so-called purity index is applied. It reflects the agreement between the intrinsic partition and the one found by the algorithm. First, the fuzzy assignment matrix is transformed to a Boolean group membership matrixU^bwith entries

u^b_{i j} =

1 if j =arg max

1≤t≤k

ui t

0 otherwise (3.65)

A partitionP^f = {C₁^f, . . . ,C_k^f}is obtained in this way, withC_j^f = {i:u^b_{i j} =1}.

Subsequently, we construct the contingency table with entries mi j = |Ci^t ∩C_j^f|.

Finally, the agreement of the two partitions is calculated as P(P¹,P²)= 1

k₁

i=1

1≤maxj≤k₂mi j (3.66) While the purity is a general purpose measure, the so called reconstruction error is a measure designed exclusively for algorithms producing fuzzy partitions [387]. It is defined as the average distance between the original object and the reconstructed one, i.e.

er = 1 m

m i=1

xi−xi² (3.67)

where the reconstructionxi is performed according to the formula xi =

k j=1u^α_{i j}μj

j=1u^α_{i j} , i =1, . . . ,m (3.68) The lower the reconstruction error the better the algorithm performance. One should, however, remember that low values of thekparameter induce rather high error values (er → 0 whenk → m). The purity index evaluates, therefore, the algorithm precision, while the reconstruction error describes the quality of encoding/decoding of objects by the prototypes and the assignment matrix. One can say thateris a measure of dispersion of prototypes in the feature space. In particular, the reconstruction error decreases when the prototypes are moved towards the centres of dense areas of feature space [214]. Hence,er measures the capability of prototypes to repre- sent individual clusters. The dependence of the reconstruction error on the fuzziness exponentαis illustrated in Fig.3.9.

Other measures of quality of fuzzy partitions are partition coefficient Fk(U)and fuzzy partition entropyHk(U). They are defined as follows, see e.g. [70]:

1 2 3 4 5 6 7 8 9 10 0.04

0.042 0.044 0.046 0.048 0.05 0.052 0.054 0.056 0.058 0.06

Fig. 3.9 Influence of theαexponent (abscissa axis) on the reconstruction error (ordinate axis).

Test data: fileiris.txt

Fk(U)=tr(U U^T)/m= 1 m

m i=1

k j=1

u²_{i j} (3.69)

Hk(U)= −1 m

m i=1

k j=1

ui jlog_aui j, a >1 (3.70)

They exhibit the following properties

Fk(u)=1⇔Hk(U)=0⇔ Uis a crisp partition Fk(U)=¹_k ⇔Hk(U)=log_a(k)⇔U = [¹_k]

k ≤Fk(U)≤1; 0≤ Hk(U)≤log_a(k)

(3.71)

The entropy Hkis more sensitive to local changes in partition quality than the coefficient F.

When data tend to concentrate into a low number of well separable groups, then these indicators constitute a good hint for the proper selection of the number of clusters.

Remark 3.3.4 The quantityHk(U)is a measure indicating the degree of fuzziness of a partition. IfUis a crisp partition, thenHk(U)=0 for any matrixUwith elements ui j ∈ {0,1}. The entropy H(U), defined later in Eq. (4.31) in Sect. 4.4.2, allows

3.3 FCM: Fuzzyc-means Algorithm 109

to explore deeper the nature of a crisp partition. One should not confuse these two

measures.

Example 3.3.1 To illustrate the above-mentioned thesis, consider two sets presented in Fig.3.10. The first setdata_6_2contains two dimensional data forming six clear- cut clusters. The second setdata_4_2was obtained by randomly picking same number of points from a two dimensional normal distributionN(mi,I),i =1, . . . ,4, where m1 = (3,0)^T,m2 = (0,3)^T, m3 = (3,3)^T,m4 = (0,6)^T, and I means a unit covariance matrix. In both cases Fk(U)and Hk(U) for various values of k were computed, see Fig.3.11. In the first case of a clear structure, we observe clear cut optima reached by both indexes. In the second case both indices behave

monotonically.

2 4 6 8 10 12 14 16 18 20 22

−3 −2 −1 0 1 2 3 4 5 6 7

−4

−2 0 2 4 6 8 10

Fig. 3.10 Test setsdata_6_2(Figure to the left) anddata_4_2(Figure to the right)

2 3 4 5 6 7 8

0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

F_k(U) H_k(U)

2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7

0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1

F_k(U) H_k(U)

Fig. 3.11 Dependence of the values of quality criteria for the setsdata_6_2(to the left) and data_4_2(to the right) on the assumed number of classes

Dalam dokumen Modern Algorithms of Cluster Analysis (Halaman 124-128)