Streams with Recurrent Concepts

(1)

Unsupervised Context Switch for Classification Tasks on Data

Streams with Recurrent Concepts

Denis M. dos Reis André G. Maletzke

Gustave E. A. P. A. Batista

(2)

Challenge

Strong classification models are expected to deal with change - Supervised learning can adapt by retraining on data where

more recent instances are emphasised [ADWIN, EWMA]

But what about streams where after initial training, true labels aren’t (and may never be) available? [Verification Latency]

- Is it even possible to detect change without true labels?

- Even if it is, how do we adjust our model without new training instances?

(3)

Answer: Not always

Without true labels, changes that do not affect the prior probability of events are undetectable

(4)

Alternative

Instead of adapting to change as it occurs [hard without real labels], exploit the tendency for concept drifts to be recurrent based on

latent factors:

- Seasonality - Gender

- Pretty much anything discrete or discretizable

Use this variable to define multiple contexts, from which to build multiple classification models from the initial training data

[Bagging with a twist!]

(5)

Example: Defining 2 contexts based on temperature for insect wingbeat frequency

A larger difference in probability distribution implies greater

context-sensitivity of the dataset

WBF-Insects

(6)

Which context to use when classifying?

It is possible to build different classifiers for different contexts. But when predicting unlabeled instances, how can we know which context is currently applicable?

Two proposals:

- Manually pick a feature from which to deduce the context

- Use the fact that common classifiers return not just a predicted class but also a confidence/probability value. Use strongest classifier (assume the context is that of this classifier)

(7)

k-Nearest Neighbour:

Distance(s) to nearest neighbour(s)

Support Vector Classifier:

Distance to Separating Hyperplane

Multi-Layer Perceptron

Intensity of output neuron Σ(Intensities of output layer)

Random Forest

#Output Class Predictions

#Total Predictions

Getting “Confidence” Values from Common Classifiers

(8)

Experimental Datasets

A “Aedes aegypti-Culex quinquefasciatus” - Binary classification problem on two species of mosquito. 6 contexts from

temperature, using WBF for feature hypothesis test

B Same as above, but classification into 4 classes (sex, species)

C “Arabic-Digit” - classifying speech to [0-9]. Sex of speaker gives 2 contexts; first MFCC for feature hypothesis test

D Same as above but inverted; digit gives 10 contexts, binary classification of sex

E “Handwritten” - classifying handwritten letters g, p, q. Author gives 10 contexts; area of writing for feature hypothesis test

F Same as above but inverted; letter gives 3 contexts, classification of 10 authors

G “WBF-Insects” (example data given earlier)

(9)

Experimental Results (Context)

(10)

Experimental Results

A₀ - Performance using feature for hypothesis test A_f - Performance using strongest classifier

B_R - Baseline using random classifier from ensemble

B_U - Baseline using single classifier without context subsetting T - Topline performance using actual correct context

(11)

Summary

- Context-based classifications are useful on data streams with recurrent concepts

- Effectiveness depends on how reliably we are able to select the correct context

- Not a good idea when drifts are not detectable, or occur without correlation to some observed feature

- Unsupervised, so predictions get worse as time passes further from the original training data