• Tidak ada hasil yang ditemukan

Shared link properties between affected points post insertion 92

4.7 Batch-Incremental SNNDB Clustering Algorithms for Addition

4.7.4 Shared link properties between affected points post insertion 92

Any change in the KNN list of an affected point may lead to a deviation in the values of associated shared strong links. We present all possible scenarios of the state of shared strong links between KN −Sadd and Sadd type affected points.

1. KN −Sadd ←→ KN −Sadd link: With entry of new points in the updated KNN list ofKN−Sadd type points, some of the old points may get replaced.

8For the first batch of arriving points, the core or non-core status of a point in DD0 is derived from initial SNNDB execution uponD.

ALGORITHM 2: BISDBadd(D, K, δsim, δcore)

1 Input: D,K , δsim, δcore;

2 Output: Clusters;

// Set nrow as the total no. of data points after increment of nrow2 points upon nrow1

3 nrow← nrow1 + nrow2;

// Update dataset after increment

4 for i1←1 to nrow2do

5 Append new data point i1to base dataset data matrix[];

6 i← i+1;

// Find KNN list of new points

7 for i←1 to nrow1do

8 for i1←nrow1 to nrow do

9 Compute the distance between data points i1 and i ;

10 if i <= K then

11 Insert data point i to the KNN list of data point i1 ;

12 else

13 Insert data point i next to the KNN list of data point i1 ;

14 sort (KN N matrix[i1]);

15 KN N matrix[i1].pop();

// Find points that can be potentially affected

16 for i1←nrow1 to nrow do

17 for j1←nrow1 to nrow do

18 if distance(i1, j1|i16=j1)<=distance(i1, KN N matrix[i1][K])then

19 Insert data point k to the KNN list of data point i ;

20 sort (KN N matrix[i1]);

21 KN N matrix[i1].pop();

22 else

23 for i←1 to nrow1do

24 for k ←nrow1 to nrow do

25 Compute distance between i and k;

26 if distance(i, k|i6=k)<=distance(i, KN N matrix[i][K])then

27 Insert data point k to the KNN list of data point i ;

28 else

29 Do nothing;

// Identify KN −Sadd and Sadd type affected points

30 for i←1 to nrow do

31 if KN N list[i].size() > K then

32 i ∈ KN −Sadd type points;

33 else

34 for each i ∈ KN −Sadd do

35 for each j ∈ KN N matrix[i] ∪ points displaced f rom KN N matrix[i]

do

36 if j /∈ KN −Sadd ∧ j is not a new point then

37 j ∈Sadd type points;

38 else

// Construct the updated K-SNN graph and detect core, non-core points incrementally

39 for each i ∈ KN −Sadd ∪ Sadd do

40 for each j ∈ KN N matrix[i] do

41 if similarity(i, j) > δsim then

42 An edge is formed between pints i and j;

43 else

44 if similarity matrix[i].size()> δcore then

45 i ∈ CORE points set;

46 else

47 i ∈ Non-CORE points set;

48 Cluster formation is similar to the SNNDB algorithm;

49 Repeat entire process for the next batch of entering points;

If the removed points contributed to the shared link strength, then the sim- ilarity value is bound to decrease. However, if the replaced points were not a part of the contributory set to shared link strength, then the similarity value remains same. Moreover, if the newly added points lie in the com- mon neighborhood of two linked data objects ensuring replacement of the non-contributory points to their shared link strength, then in that case the inserted points add to the similarity value of the concerned pair. Therefore, for a KN −Sadd ←→KN−Sadd type link, the strength of shared link either decreases, remains same or increases.

2. KN −Sadd ←→ Sadd link: The Sadd type points do not change their KNN list. Therefore if the removed points from the KNN list of any KN −Sadd

type point previously contributed to the shared link strength with a Sadd

type point, then the strength of shared link is bound to decrease. However, if the replaced points were not a part of the contributory set to shared link strength, then the similarity value remains identical for aKN−Sadd ←→Sadd type link.

3. Sadd←→Sadd link: With no change in the KNN list forSaddtype points post new insertions, the points which originally contributed to the shared link

strength remain unaffected. As a result no change in shared link strength is observed for a Sadd ←→Sadd type link.

4.7.5 Summary of the batch-incremental SNNDB cluster- ing algorithms for addition

Table 4.4: Summary of the batch-incremental SNNDB clustering algorithms for addition

Components-Algorithm BatchInc1 BatchInc2 BISDBadd Updated KNN list Incrementally Incrementally Incrementally Updated K-SNN graph Non-Incrementally Incrementally Incrementally Updated core and non-core points Non-Incrementally Non-Incrementally Incrementally

In an attempt to improve the efficiency over SNNDB while handling dynamic inser- tion, we initially propose theBatch−Inc1 algorithm. Batch−Inc1 computes the updated KNN list of all the data points incrementally while rest of the components are computed similar to SNNDB. In order improve uponBatch−Inc1, we propose Batch−Inc2 which rebuilds both the updated KNN lists and the K-SNNupdated graph upon entry of new data points incrementally. The third algorithm in form of BISDBadd computes all the three components of SNNDB incrementally. This involves detection of core and non-core points apart from constructing KNN list and updated K-SNN graph (Refer Table 4.4).

The SNNDB method takes O(N2) time towards completion where N is the total number of data points. This is mainly due the construction of similarity matrix and KNN lists. Batch−Inc1 provides marginal improvement by building the up- dated KNN lists incrementally inO(N) time. However, building the K-SNNupdated graph non-incrementally involves quadratic time complexity. Batch−Inc2 aims to address this issue by reconstructing the K-SNNupdatedgraph incrementally upon entry of new data points. While building the K-SNNupdated graph, Batch−Inc2 only updates the shared strong link strengths of KN −Sadd and Sadd type points leaving rest of the unaffected points. For identifying the new core and non-core points, Batch−Inc2 involves all the data points in D0 (updated dataset). This results inBatch−Inc2 having a linear time complexity. BISDBadd identifies the new set of core and non-core points incrementally and therefore improves upon the previous two sub-variant algorithms for addition. BISDBadd also runs in linear

time (Refer Algorithm 2 for pseudo-code ofBISDBadd). Next we present the time complexity analysis of the BISDBadd algorithm.

4.8 Time complexity analysis of the BISDB

add