Concept Drift with Ensembles

(1)

Fast Adaptive

Stacking Ensemble

(2)

Datamining

◇ Many real world problems constantly generate loads of data

◇ The target function to be learned can change over time

◇ This is called Concept Drift

◇ Previous learning models can become outdated. or even contradictory regarding the most recent data

2

(3)

Ensembles

3

◇ Combine the predictions from base classiﬁers

￭ improving the predictive accuracy obtained by a single classiﬁer.

◇ Ensemble have 3 main components

￭ A base learning algorithm

￭ A method to weight training example

￭ A voting procedure

￮ E.g Majority voting or Weighted Voting

(4)

Concept Drift with Ensembles

4

◇ Performance measures to measure ensemble consistency with new data

◇ Signiﬁcant changes in performance values are interpreted as concept drift

◇ The Ensemble eliminates.reactivates or adds base line classiﬁers dynamically in response

(5)

Why F.A.S.E.?

5

◇ majority voting on predictions has been the primary way of making predictions in ensembles.

◇ The underlying relationship between base classiﬁers and true prediction may be more complex than a simple linear combination of predictions.

◇ Uses Adaptive learners that deal with Drifting data

(6)

Hoeffding Drift Detection Method

6

◇ When a change is detected the worst classiﬁer is removed and a new classiﬁer is added

◇ HDMM Triggers

￭ In-Control

￮ When the current concept remains stable

￭ Warning

￮ When a concept drift is likely to be approaching

￭ Out of Control

￮ When a concept drift is detected

◇ Uses Adaptive learners that deal with Drifting data

(7)

Adaptive Learners

7

◇ Normal Learning models usually increase their

prediction error rate when a concept drift occurs. as in these cases they are not in accordance with the more recent data.

◇ Uses the 0-1 loss function between the predicted class label and the true one for error estimates

◇ Each adaptive learner also uses HDDM which monitors its error rate in order to trigger three different drift signals during the learning process.

(8)

Hoeffding Drift Detection Method: Adaptive Learners

8

Each Adaptive learner uses a single classiﬁer for stable concepts

￭ At warning the learner starts to train an alternative classiﬁer

￭ At Out of Control the learner replaces the classiﬁer with the alternative

◇ Adaptive Learners can therefore have at most 2 classiﬁers

◇ The predictions of these 2 are combined using weighted voting

(9)

FASE

9

◇ FASE can be viewed as a 3 Level ensemble of classiﬁers each level being able to handle concept drift explicitly because of the adaptive learners individually adapting themselves.

◇ The meta-learner of FASE receives as input meta-instances whose attributes are nominal

◇ FASE uses the test then train approach to generate these meta-instances

◇ Adaptive Learners are essential to this algorithm as they are used in both the base line and Meta

classiﬁers

(10)

Empirical Study

10

◇ Instances arrived on time in their order

◇ Classiﬁers were ﬁrst tested and then trained

◇ Sliding window of size 100 was used

◇ Used Naive bayes and Perceptron as base learning algorithms

￭ They are both constant in time and space complexity

￭ Able to process weighted instances which make them suitable in bagging approach used in FASE

(11)

Empirical Study: Algorithms

11

◇ FASE

￭ The new algorithm, using Na’ıve Bayes (NB) or Perceptron as base classiﬁer and meta-learner.

◇ OzaBag

￭ The online version of bagging for data streams

◇ OzaBag-ADWIN

￭ The ensemble method based on bagging, using NB or Perceptron as base classiﬁer and using adwin for change detection.

◇ OzaBag-HDDMA-test

￭ Similar to OzaBag-ADWIN, but using HDDMA-test instead of ADWIN as change detector and estimator, as well as HDDMA-test in combination with NB or Perceptron as base classiﬁer.

◇ HDDMA-test

￭ The Hoeffding-based Drift Detection Method combined with NB or Perceptron.

◇ DDM

￭ The Drift Detection Method combined with NB or Perceptron.

(12)

Table 1: Predictive Performance

Using Naive Bayes as the baseline Classiﬁer

1

(13)

Algorithms Stats Spam Elec2 USE1 USE2 COV NURS No. of Wins FASE X̅

Std 92.03

6.35 85.25

6.40 76.93

10.65 74.00

11.25 88.10

7.55 93.22

6.07 6

OzaBag X̅

Std 90.44

10.97 74.26

14.63 62.87

23.92 71.93

11.43 60.55

21.76 84.12

14.18 0

OzaBag-

Adwin X̅

Std 90.53

10.84 78.91

12.13 64.07

20.50 72.33

11.38 83.06

11.94 90.15

9.41 0

OzaBag-

HDMM^A-test X̅

Std 91.33

6.64 83.78

7.31 74.20

11.98 70.87

13.11 86.21

8.55 92.52

6.56 0

HDMM^A-test X̅

Std 90.67

9.26 85.09

6.32 75.20

11.20 71.00

12.84 87.44

7.97 92.51

6.48 0

DDM X̅

Std 89.50

13.82 82.70

8.69 73.73

12.26 72.93

11.68 88.03

8.35 91.72

7.09 0

Naive-Bayes X̅

Std 90.63

10.87 74.17

14.67 63.33

22.84 72.13

11.15 60.53

21.76 83.35

14.88 0

13

(14)

Table 2

Using Perceptron as the baseline Classiﬁer

2

(15)

Algorithms Stats Spam Elec2 USE1 USE2 COV NURS No. of Wins FASE X̅

Std 97.14

3.60 67.87

7.74 74.00

9.97 73.80

9.16 80.29

20.36 46.28

28.29 2

OzaBag X̅

Std 97.33

3.20 42.44

14.00 71.60

12.59 73.27

8.94 48.75

32.12 82.42

14.81 0

OzaBag-

Adwin X̅

Std 97.29

3.20 42.43

13.99 71.20

12.89 73.27

8.94 47.71

32.08 84.46

12.63 0

OzaBag-

HDMM^A-test X̅

Std 97.34

3.19 42.43

13.99 72.47

10.61 73.33

8.99 25.77

30.19 85.71

9.92 2

HDMM^A-test X̅

Std 97.12

3.30 42.43

13.99 75.40

9.97 74.40

9.16 26.43

31.09 82.62

12.06 1

DDM X̅

Std 96.94

5.00 42.43

13.99 74.53

11.14 75.13

9.17 46.53

32.89 83.65

10.92 1

Perceptron X̅

Std 97.16

3.30 42.44

14.00 73.20

11.77 74.87

8.29 48.75

32.12 75.88

18.95 0

15

(16)

Thanks!

Any questions?

16