Unsupervised Context Switch for Classification Tasks on Data
Streams with Recurrent Concepts
Denis M. dos Reis André G. Maletzke
Gustave E. A. P. A. Batista
Challenge
Strong classification models are expected to deal with change - Supervised learning can adapt by retraining on data where
more recent instances are emphasised [ADWIN, EWMA]
But what about streams where after initial training, true labels aren’t (and may never be) available? [Verification Latency]
- Is it even possible to detect change without true labels?
- Even if it is, how do we adjust our model without new training instances?
Answer: Not always
Without true labels, changes that do not affect the prior probability of events are undetectable
Alternative
Instead of adapting to change as it occurs [hard without real labels], exploit the tendency for concept drifts to be recurrent based on
latent factors:
- Seasonality - Gender
- Pretty much anything discrete or discretizable
Use this variable to define multiple contexts, from which to build multiple classification models from the initial training data
[Bagging with a twist!]
Example: Defining 2 contexts based on temperature for insect wingbeat frequency
A larger difference in probability distribution implies greater
context-sensitivity of the dataset
WBF-Insects
Which context to use when classifying?
It is possible to build different classifiers for different contexts. But when predicting unlabeled instances, how can we know which context is currently applicable?
Two proposals:
- Manually pick a feature from which to deduce the context
- Use the fact that common classifiers return not just a predicted class but also a confidence/probability value. Use strongest classifier (assume the context is that of this classifier)
k-Nearest Neighbour:
Distance(s) to nearest neighbour(s)
Support Vector Classifier:
Distance to Separating Hyperplane
Multi-Layer Perceptron
Intensity of output neuron Σ(Intensities of output layer)
Random Forest
#Output Class Predictions
#Total Predictions
Getting “Confidence” Values from Common Classifiers
Experimental Datasets
A “Aedes aegypti-Culex quinquefasciatus” - Binary classification problem on two species of mosquito. 6 contexts from
temperature, using WBF for feature hypothesis test
B Same as above, but classification into 4 classes (sex, species)
C “Arabic-Digit” - classifying speech to [0-9]. Sex of speaker gives 2 contexts; first MFCC for feature hypothesis test
D Same as above but inverted; digit gives 10 contexts, binary classification of sex
E “Handwritten” - classifying handwritten letters g, p, q. Author gives 10 contexts; area of writing for feature hypothesis test
F Same as above but inverted; letter gives 3 contexts, classification of 10 authors
G “WBF-Insects” (example data given earlier)
Experimental Results (Context)
Experimental Results
A0 - Performance using feature for hypothesis test Af - Performance using strongest classifier
BR - Baseline using random classifier from ensemble
BU - Baseline using single classifier without context subsetting T - Topline performance using actual correct context
Summary
- Context-based classifications are useful on data streams with recurrent concepts
- Effectiveness depends on how reliably we are able to select the correct context
- Not a good idea when drifts are not detectable, or occur without correlation to some observed feature
- Unsupervised, so predictions get worse as time passes further from the original training data