Shared link properties between affected points post insertion 92

4.7 Batch-Incremental SNNDB Clustering Algorithms for Addition

4.7.4 Shared link properties between affected points post insertion 92

Any change in the KNN list of an affected point may lead to a deviation in the values of associated shared strong links. We present all possible scenarios of the state of shared strong links between KN −S_add and S_add type affected points.

1. KN −S_add ←→ KN −S_add link: With entry of new points in the updated KNN list ofKN−S_add type points, some of the old points may get replaced.

8For the first batch of arriving points, the core or non-core status of a point in D∩D⁰ is derived from initial SNNDB execution uponD.

ALGORITHM 2: BISDB_add(D, K, δ_sim, δ_core)

1 Input: D,K , δ_sim, δ_core;

2 Output: Clusters;

// Set nrow as the total no. of data points after increment of nrow2 points upon nrow1

3 nrow← nrow1 + nrow2;

// Update dataset after increment

4 for i1←1 to nrow2do

5 Append new data point i1to base dataset data matrix[];

6 i← i+1;

// Find KNN list of new points

7 for i←1 to nrow1do

8 for i1←nrow1 to nrow do

9 Compute the distance between data points i1 and i ;

10 if i <= K then

11 Insert data point i to the KNN list of data point i1 ;

12 else

13 Insert data point i next to the KNN list of data point i1 ;

14 sort (KN N matrix[i1]);

15 KN N matrix[i1].pop();

// Find points that can be potentially affected

16 for i1←nrow1 to nrow do

17 for j1←nrow1 to nrow do

18 if distance(i1, j1|i16=j1)<=distance(i1, KN N matrix[i1][K])then

19 Insert data point k to the KNN list of data point i ;

20 sort (KN N matrix[i1]);

21 KN N matrix[i1].pop();

22 else

23 for i←1 to nrow1do

24 for k ←nrow1 to nrow do

25 Compute distance between i and k;

26 if distance(i, k|i6=k)<=distance(i, KN N matrix[i][K])then

27 Insert data point k to the KNN list of data point i ;

28 else

29 Do nothing;

// Identify KN −S_add and S_add type affected points

30 for i←1 to nrow do

31 if KN N list[i].size() > K then

32 i ∈ KN −S_add type points;

33 else

34 for each i ∈ KN −S_add do

35 for each j ∈ KN N matrix[i] ∪ points displaced f rom KN N matrix[i]

36 if j /∈ KN −S_add ∧ j is not a new point then

37 j ∈S_add type points;

38 else

// Construct the updated K-SNN graph and detect core, non-core points incrementally

39 for each i ∈ KN −S_add ∪ S_add do

40 for each j ∈ KN N matrix[i] do

41 if similarity(i, j) > δ_sim then

42 An edge is formed between pints i and j;

43 else

44 if similarity matrix[i].size()> δ_core then

45 i ∈ CORE points set;

46 else

47 i ∈ Non-CORE points set;

48 Cluster formation is similar to the SNNDB algorithm;

49 Repeat entire process for the next batch of entering points;

If the removed points contributed to the shared link strength, then the similarity value is bound to decrease. However, if the replaced points were not a part of the contributory set to shared link strength, then the similarity value remains same. Moreover, if the newly added points lie in the com- mon neighborhood of two linked data objects ensuring replacement of the non-contributory points to their shared link strength, then in that case the inserted points add to the similarity value of the concerned pair. Therefore, for a KN −S_add ←→KN−S_add type link, the strength of shared link either decreases, remains same or increases.

2. KN −S_add ←→ S_add link: The S_add type points do not change their KNN list. Therefore if the removed points from the KNN list of any KN −Sadd

type point previously contributed to the shared link strength with a Sadd

type point, then the strength of shared link is bound to decrease. However, if the replaced points were not a part of the contributory set to shared link strength, then the similarity value remains identical for aKN−S_add ←→S_add type link.

3. S_add←→S_add link: With no change in the KNN list forS_addtype points post new insertions, the points which originally contributed to the shared link

strength remain unaffected. As a result no change in shared link strength is observed for a S_add ←→S_add type link.

4.7.5 Summary of the batch-incremental SNNDB cluster- ing algorithms for addition

Table 4.4: Summary of the batch-incremental SNNDB clustering algorithms for addition

Components-Algorithm Batch−Inc1 Batch−Inc2 BISDB_add Updated KNN list Incrementally Incrementally Incrementally Updated K-SNN graph Non-Incrementally Incrementally Incrementally Updated core and non-core points Non-Incrementally Non-Incrementally Incrementally

In an attempt to improve the efficiency over SNNDB while handling dynamic insertion, we initially propose theBatch−Inc1 algorithm. Batch−Inc1 computes the updated KNN list of all the data points incrementally while rest of the components are computed similar to SNNDB. In order improve uponBatch−Inc1, we propose Batch−Inc2 which rebuilds both the updated KNN lists and the K-SNN_updated graph upon entry of new data points incrementally. The third algorithm in form of BISDB_add computes all the three components of SNNDB incrementally. This involves detection of core and non-core points apart from constructing KNN list and updated K-SNN graph (Refer Table 4.4).

The SNNDB method takes O(N²) time towards completion where N is the total number of data points. This is mainly due the construction of similarity matrix and KNN lists. Batch−Inc1 provides marginal improvement by building the updated KNN lists incrementally inO(N) time. However, building the K-SNN_updated graph non-incrementally involves quadratic time complexity. Batch−Inc2 aims to address this issue by reconstructing the K-SNN_updatedgraph incrementally upon entry of new data points. While building the K-SNN_updated graph, Batch−Inc2 only updates the shared strong link strengths of KN −S_add and S_add type points leaving rest of the unaffected points. For identifying the new core and non-core points, Batch−Inc2 involves all the data points in D⁰ (updated dataset). This results inBatch−Inc2 having a linear time complexity. BISDB_add identifies the new set of core and non-core points incrementally and therefore improves upon the previous two sub-variant algorithms for addition. BISDB_add also runs in linear

time (Refer Algorithm 2 for pseudo-code ofBISDB_add). Next we present the time complexity analysis of the BISDB_add algorithm.

4.8 Time complexity analysis of the BISDB

_add

Dalam dokumen Density-Based Mining Algorithms for Dynamic Data (Halaman 126-130)