Fast Adaptive
Stacking Ensemble
Datamining
◇ Many real world problems constantly generate loads of data
◇ The target function to be learned can change over time
◇ This is called Concept Drift
◇ Previous learning models can become outdated. or even contradictory regarding the most recent data
2
Ensembles
3
◇ Combine the predictions from base classifiers
■ improving the predictive accuracy obtained by a single classifier.
◇ Ensemble have 3 main components
■ A base learning algorithm
■ A method to weight training example
■ A voting procedure
○ E.g Majority voting or Weighted Voting
Concept Drift with Ensembles
4
◇ Performance measures to measure ensemble consistency with new data
◇ Significant changes in performance values are interpreted as concept drift
◇ The Ensemble eliminates.reactivates or adds base line classifiers dynamically in response
Why F.A.S.E.?
5
◇ majority voting on predictions has been the primary way of making predictions in ensembles.
◇ The underlying relationship between base classifiers and true prediction may be more complex than a simple linear combination of predictions.
◇ Uses Adaptive learners that deal with Drifting data
Hoeffding Drift Detection Method
6
◇ When a change is detected the worst classifier is removed and a new classifier is added
◇ HDMM Triggers
■ In-Control
○ When the current concept remains stable
■ Warning
○ When a concept drift is likely to be approaching
■ Out of Control
○ When a concept drift is detected
◇ Uses Adaptive learners that deal with Drifting data
Adaptive Learners
7
◇ Normal Learning models usually increase their
prediction error rate when a concept drift occurs. as in these cases they are not in accordance with the more recent data.
◇ Uses the 0-1 loss function between the predicted class label and the true one for error estimates
◇ Each adaptive learner also uses HDDM which monitors its error rate in order to trigger three different drift signals during the learning process.
Hoeffding Drift Detection Method: Adaptive Learners
8
Each Adaptive learner uses a single classifier for stable concepts
■ At warning the learner starts to train an alternative classifier
■ At Out of Control the learner replaces the classifier with the alternative
◇ Adaptive Learners can therefore have at most 2 classifiers
◇ The predictions of these 2 are combined using weighted voting
FASE
9
◇ FASE can be viewed as a 3 Level ensemble of classifiers each level being able to handle concept drift explicitly because of the adaptive learners individually adapting themselves.
◇ The meta-learner of FASE receives as input meta-instances whose attributes are nominal
◇ FASE uses the test then train approach to generate these meta-instances
◇ Adaptive Learners are essential to this algorithm as they are used in both the base line and Meta
classifiers
Empirical Study
10
◇ Instances arrived on time in their order
◇ Classifiers were first tested and then trained
◇ Sliding window of size 100 was used
◇ Used Naive bayes and Perceptron as base learning algorithms
■ They are both constant in time and space complexity
■ Able to process weighted instances which make them suitable in bagging approach used in FASE
Empirical Study: Algorithms
11
◇ FASE
■ The new algorithm, using Na’ıve Bayes (NB) or Perceptron as base classifier and meta-learner.
◇ OzaBag
■ The online version of bagging for data streams
◇ OzaBag-ADWIN
■ The ensemble method based on bagging, using NB or Perceptron as base classifier and using adwin for change detection.
◇ OzaBag-HDDMA-test
■ Similar to OzaBag-ADWIN, but using HDDMA-test instead of ADWIN as change detector and estimator, as well as HDDMA-test in combination with NB or Perceptron as base classifier.
◇ HDDMA-test
■ The Hoeffding-based Drift Detection Method combined with NB or Perceptron.
◇ DDM
■ The Drift Detection Method combined with NB or Perceptron.
Table 1: Predictive Performance
Using Naive Bayes as the baseline Classifier
1
Algorithms Stats Spam Elec2 USE1 USE2 COV NURS No. of Wins FASE X̅
Std 92.03
6.35 85.25
6.40 76.93
10.65 74.00
11.25 88.10
7.55 93.22
6.07 6
OzaBag X̅
Std 90.44
10.97 74.26
14.63 62.87
23.92 71.93
11.43 60.55
21.76 84.12
14.18 0
OzaBag-
Adwin X̅
Std 90.53
10.84 78.91
12.13 64.07
20.50 72.33
11.38 83.06
11.94 90.15
9.41 0
OzaBag-
HDMMA-test X̅
Std 91.33
6.64 83.78
7.31 74.20
11.98 70.87
13.11 86.21
8.55 92.52
6.56 0
HDMMA-test X̅
Std 90.67
9.26 85.09
6.32 75.20
11.20 71.00
12.84 87.44
7.97 92.51
6.48 0
DDM X̅
Std 89.50
13.82 82.70
8.69 73.73
12.26 72.93
11.68 88.03
8.35 91.72
7.09 0
Naive-Bayes X̅
Std 90.63
10.87 74.17
14.67 63.33
22.84 72.13
11.15 60.53
21.76 83.35
14.88 0
13
Table 2
Using Perceptron as the baseline Classifier
2
Algorithms Stats Spam Elec2 USE1 USE2 COV NURS No. of Wins FASE X̅
Std 97.14
3.60 67.87
7.74 74.00
9.97 73.80
9.16 80.29
20.36 46.28
28.29 2
OzaBag X̅
Std 97.33
3.20 42.44
14.00 71.60
12.59 73.27
8.94 48.75
32.12 82.42
14.81 0
OzaBag-
Adwin X̅
Std 97.29
3.20 42.43
13.99 71.20
12.89 73.27
8.94 47.71
32.08 84.46
12.63 0
OzaBag-
HDMMA-test X̅
Std 97.34
3.19 42.43
13.99 72.47
10.61 73.33
8.99 25.77
30.19 85.71
9.92 2
HDMMA-test X̅
Std 97.12
3.30 42.43
13.99 75.40
9.97 74.40
9.16 26.43
31.09 82.62
12.06 1
DDM X̅
Std 96.94
5.00 42.43
13.99 74.53
11.14 75.13
9.17 46.53
32.89 83.65
10.92 1
Perceptron X̅
Std 97.16
3.30 42.44
14.00 73.20
11.77 74.87
8.29 48.75
32.12 75.88
18.95 0
15
Thanks!
Any questions?
16