• Tidak ada hasil yang ditemukan

A Hierarchical clustering procedure is one which successively merges smaller clusters into larger ones (agglomerative), or divides larger clusters into smaller ones (divisive). This process may be represented by a tree-like structure called a dendrogram which depicts the relationship between objects or clusters. The dendrogram shows how single objects and clusters are grouped together at each step and provides a measure of similarity between them. This similarity is the Euclidean distance where if the distance between two clusters is small then they are close together and hence more similar. If the distance is large then the clusters are less similar. The Euclidean distance on

they-axis on the dendrogram is the distance between the singletons, and thereafter are the distances between centroids of clusters.

Figure 4.4 demonstrates the hierarchical clustering method by way of an example. Figure 4.4 (a) shows a set of 5 points of morning and afternoon averages ofkb. The method starts off by assuming each point is a cluster on its own. Then the clusters that are closer together merge to form a new cluster. The distance between clusters 3 and 4 is 0.04, and are clearly closer to each other than to other clusters, hence they merge to form cluster 6. The distances between clusters 2 and 1 and 2 and 5 are 0.23 and 0.21, respectively. Therefore, clusters 2 and 5 merge to form cluster 7. For merging clusters, Ward linkage was used. There are also other linkage options such as single, average and complete. However, according to Tuff´ery (2011) the Ward linkage (Ward, 1963) is considered the most effective linkage method. At the last step, clusters 6 and 8 merge.

Figure 4.4: (a) Five points that will be clustered using the hierarchical method. Each point starts off as a cluster on its own. (b) Dendrogram showing how clusters in (a) were merged. Clusters 3 and 4 and 2 and 5 were merged at distance 0.04 and 0.21, respectively. The centroid of cluster 7 was merged with cluster 1 at a distance of 0.4. Lastly, the centroids of clusters 6 and 8 were merged at distance 1.4.

To demonstrate the use of hierarchical clustering on the minute-resolutionkbprofiles, the method was applied to the kb Principal Components. The Ward’s linkage method was used with the Eu- clidean distance as the metric. According to equation 4.3, the Ward’s linkage method minimizes the total within-cluster sum of the squared error (SSE) when merging two clusters. The Ward’s distance between two clustersAandBhaving centersaandband frequenciesnAandnB, is given by

d(A, B) = d(a, b)2

nA−1+nB−1, (4.3)

whereaandbare the centroids of clustersAandB, respectively. Once all of the objects are clustered the dendrogram is produced. Cutting the dendrogram at a desired level will result in a set of disjoint groups (or clusters). However, in the present study, the optimal number of clusters was not known a priori. The choice of the optimal number of clusters in order to specify the level at which the dendrogram should be cut must be decided using an appropriate method. The present work used the cluster sum of squares as a guide to finding the level at which the dendrogram should be cut to yield the optimal number of clusters.

Computing the cluster sum of squares for different clustering solutions, can be used as a guide for choosing the optimal number of clusters. According to Tuff´ery (2011), the total sum of squares, I, of the cluster is the weighted mean of the squares of the distances of the individual points from the cluster center (or centroid), and is given by

I =X

i∈I

pi(xi−x)¯ 2, (4.4)

wherex¯is the mean of xi andpi is the weight associated with observationi. In a similar manner, the sum of squares of a cluster is computed with respect to its own center

Ij =X

i∈Ij

pi(xi−x¯j)2. (4.5)

If the data is partitioned intokclusters, each with sums of squaresI1, . . .,Ik, then within-cluster sum of squares,IW, is

IW =

k

X

j=1

Ij. (4.6)

The between-cluster sum of squares,IB, is defined as the mean of the squares of the distances of the centers of each cluster from the global center, given by

IB = X

j∈clusters

X

i∈Ij

pi

!

(xj −x)¯ 2. (4.7)

Therefore, the total sum of squares is the sum of the within-sum of squares and between-sum of squares, given as

I =IW +IB. (4.8)

The illustration in Figure 4.5 depicts the total sum of squares for a set of points which is the sum of the within-sum of squares and between-sum of squares.

Figure 4.5:The total cluster sum of squares (I) is the sum of the within-sum of squares (IW) and between- sum of squares (IB). Global cluster centers are indicated in red. Adapted from Tuff´ery (2011).

The value forIW can be used to find the optimal number of clusters present in the data. If all points belong to one cluster i.e. k = 1,IW will be high since there will be points that are far away from the cluster centroid, thus increasing the sum of squares. Ask increases, IW decreases since there are more centroids and the clusters become more homogeneous. However, finding the largest kis not necessarily the best clustering solution. Instead the number of clusters should be increased such that if the last significant decrease in IW occurs when moving from k to k + 1 clusters, the partition intok+ 1 clusters is correct. This is demonstrated in Figure 4.6.

To decide on the level of cutting of the dendrogram and to obtain the kb clusters, Figure 4.6

shows IW computed for values of kranging from 1 to 10. The curve starts off at a high value for k= 1 which is expected since all objects are assigned to one cluster. Ask increases,IW decreases dramatically and thereafter begins to flatten out askapproaches 10. Tuff´ery (2011) recommends that the value ofkshould be chosen such that on moving fromktok+ 1, there is an insignificant decrease inIW. However, Tuff´ery (2011) provides no criteria for what constitutes an insignificant decrease of IW, so choosing the cut-off value of k is a matter of judgement. For the minute-resolution kb data, the last significant decrease was chosen to bek= 3 tok= 4. Therefore, the optimal number of clusters is set to 4. The dendrogram can now be cut at the level that yields 4 clusters.

Figure 4.6: Within-cluster sum of squares for varying values of k, for kb clusters using the hierarchical method. Fork= 1,IW is high. Askincreases,IW decreases dramatically and thereafter begins to flatten as kapproaches 10. The optimal value ofkis 4 since moving fromk= 3 tok= 4 results in a small decrease in IW.

The silhouette plot for the hierarchical kb clusters of the Durban data is given in Figure 4.7.

Cluster 1 has a lowSIC and is rather weakly clustered. Cluster 2 also has a lowSIC. TheSIC for Cluster 3 is above 0.8 indicating a compact cluster. Cluster 4 has a slightly lowerSIC than Cluster

Figure 4.7:Silhouette plot for clusters 1 to 4. Clusters 1 and 2 have lowSICindicating less compact clusters.

Clusters 3 and 4 have highSIC indicating compact clusters. The percentage of days in each cluster is also given. NegativeSIvalues are days that lie closer to the border of the cluster.

3, but nevertheless still sufficiently high to be regarded as compact. The percentage of days in each cluster is also given. Days with negative are close to the border of two clusters and comprise 11%

of days. For all 4 clusters produced by the Ward’s hierarchical method, theSIT OT was found to be 0.61.

The Ward’s hierarchical clustering procedure applied to the PCA-reducedkb data, produced the dendrogram in Figure 4.8. Using the with-cluster sum criterion in Figure 4.6, the dendrogram was cut at the level that produced 4 clusters. A cluster map showing the first two Principal Components is given in Figure 4.9. Cluster 3 and Cluster 4 are relatively compact. However, Clusters 1 and 2 are less compact.