Classiﬁers for Ensemble Learning on Data Streams

(1)

Pairwise Combination of

Classiﬁers for Ensemble Learning on Data Streams

COMPX523 Presentation

Paper by Heitor Murilo Gomes, Jean Paul Barddal and Fabrício Enembreck

Presentation by Hongyu Wang

(2)

Introduction

We’ll discuss two ensemble learning pairwise voting strategies:

● Pairwise Accuracy (PA)

● Pairwise Patterns (PP) What are voting strategies?

- Ways of summing up predictions made by individual classiﬁers in an ensemble.

(3)

Background & Motivation

Classiﬁer diversity is very important to ensemble learning.

Some degree of overlap between the classiﬁers can almost always be expected.

Pairwise Accuracy and Pairwise Patterns can use the overlaps to support ensemble prediction.

(4)

Pairwise Accuracy

For each possible pair of classiﬁers, c_i and c_j, we can calculate their shared accuracy and error rate:

We also need the accuracy of each individual classiﬁer:

(5)

Pairwise Accuracy Continued

During prediction, we have a vector, v, to store votes for the labels. If a pair of classiﬁers, c_i and c_j, vote for the same label, i.e. h_i(x) == h_j(x):

If they vote for different labels, i.e. h_i(x) != h_j(x):

The label with the most vote in the end is the overall prediction.

(6)

Pairwise Accuracy Example

Suppose we have two classiﬁers, c_i with an accuracy of 85% and c_j with an accuracy of 75%, and have a shared accuracy of 65%

and a shared error rate of 5%.

If they both predict label 0, label 0 receives vote 0.65 - 0.05 = 0.6.

If c_i votes for label 0 and c_j votes for label 1, label 0 receives vote 0.85 - 0.65 = 0.2, label 1 receives vote 0.75 - 0.65 = 0.1.

c_j-correct c_j-error

C_i-correct 65% 20%

C_i-error 10% 5%

label 0 label 1

+0.6

label 0 label 1

+0.2 +0.1

(7)

Pairwise Patterns

Pairwise Patterns doesn’t evaluate individual classiﬁers.

Instead, it records the relation between the pair of predicted labels and the correct label for each pair of classiﬁers and each training instance.

For prediction, it refers to a matrix of records constructed during training and votes according to the prediction pattern.

(8)

Pairwise Patterns Example

Let’s assume a pair of classiﬁers that haven’t predicted anything together, their matrix is 0 across all patterns for all labels.

Now some training instances comes in, in 7 cases where the ﬁrst classiﬁers predicts a while the second predicts b, the correct labels have 1 a, 1 b and 5 c’s. The training process increments the counts of a, b and c accordingly for pattern (a,b). Now for

prediction if the ﬁrst classiﬁer votes for a and the second votes for b, the pair will give 1 vote to a, 1 vote to b and 5 votes to c as a result of the pattern (a,b).

a b c

... ... ... ...

(a,b) 0 0 0

... ... ... ...

a b c

... ... ... ...

(a,b) 1 1 5

... ... ... ...

(9)

Experiments

10 data streams were tested by the authors of the paper, including 4 real datasets (Spam Corpus (SPAM), Forest Covertype (COVT) , Air-lines (AIRL), Electricity (ELEC)) and 6 synthetic data streams (SEA generator (SEA), Agrawal generator (AGR), Random tree generator, Hyperplane generator (Hyper)). PA was tested with Generic Ensemble, and PP was tested with Generic Ensemble and Leveraging Bagging. A few other ensemble methods were also tested for benchmarking purposes.

(10)

Results

PA and PP gave a statistically signiﬁcant boost to the performance of Generic Ensemble and Leveraging Bagging for some data streams compared with the default strategy.

PA is able to adapt to drifts relatively fast and PP can utilise patterns even if the individual classiﬁers are relatively poor.

(11)

Results Continued

(12)

Summary

Pairwise Accuracy

● emphasises agreement between good classiﬁers

● adapts to change Pairwise Patterns

● extracts information from prediction patterns

● can use the patterns well even with poor classiﬁers