Partially-labeled Data Stream

(1)

Deep Learning in

Partially-labeled Data Stream

(2)

DEEP LEARNING IN DATA STREAMS

● The task of data-stream learning is to learn some classifier h to provide a classification for each ˆyt+1 for each xt+1.

● Classification is often a single class.

● Generalise to multi-task/multi-dimensional case of multiple target variables.

● Prior to to classification the true classification of Xt maybe provided: Yt.

● At time step t data-stream learner uses each pair (Xt, Yt) as a training example to update its model.

● Fully supervised case the try classification is always made available prior to classification.

● Semi supervised/partially-labeled case, only a subset have true labels, so some instances there is no ground truth classification.

(3)

Restricted Boltzmann Machines (RBMs)

● 5 input units

● 4 hidden units

● Associated with a wight

(4)

Restricted Boltzmann Machines (RBMs)

● Fully unsupervised.

● Used to discover the underlying regularities of training data.

● Two layers if features, a visible later(original feature-attribute space).

● Hidden layer.

● Fully connected between layers.

● Unconnected within layers.

● Each visible units is fully connected to every hidden unit via a weight

● RBM learns the weights.

● Hidden variables can provide a compact representation of the underlying pattern of the input.

(5)

Deep Belief Networks (DBNs)

● RBMs stacked on top of each other.

● First RBM takes instances xt from input space and produces output zt(1).

● This continues until L many layers Zt(L).

● Then supervised learning happens when labels are available.

● Two strategies:

○ DBN-h

○ DBN-BP

(6)

DBN-h

(7)

DBN-BP

● Predicts the labels directly by adding a final layer of equivalent dimension to the label space.

● To make predictions correspond to the label back propagation algorithm os run to fine tune the weights of the network.

● Difference between DBN-h is that h is a linear layer, and the weights of the RBM are updated.

(8)

Experiments

● Remove the labels of a certain portion of the instances. To simulate a partially-labeled stream.

● If method only deals with labeled examples they ignore instances with no labels.

● Split each dataset into 20 evenly-sized windows.

● Create a model from the instances in the first window.

● Update incrementally with each remaining instances.

● Gauge the accuracy of the classifier first before updating/

(9)

Datasets

● Enron E-mail Subset

○ Set of 1703 emails manually labeled into 53 categories.

○ Time ordered.

● 20 Newsgroups

○ Around 20,000 articles sources from 20 newsgroups.

○ Ordered by date over several months

● Aviation Safety Reports

○ 28,596 instances of aviation safety reports.

○ 22 possibles problems with 2.16 average problems per report.

● Forest Cover Type

○ 7 types of forest cover associated with cells based on 54 attributes.

○ 581,012 instances.

(10)

Evaluation

● A classifier for each dataset is created.

● Accuracy is gauged as

● Eron and TMC7 learning k many classifiers.

● 20NG and CType learn single multi-class classifier of k possible classes.

● Due to sparse / absence of multi-labeling.

(11)

Eron : HT vs. RBM-HT

● Take k nearest neighbour and hoeffding trees and compare their performance with and without an RBM-transformed feature space.

(12)

20NG : HT vs RMB-HT

(13)

TMC7 : HT vs RBM-HT

(14)

CType : HT vs RB-HT

(15)

Enron : knn vs RBM-knn

(16)

20NG : kNN vs RBM-kNN

(17)

TMC7 : kNN vs RBM-knn

(18)

CType : kNN vs RBM-kNN

(19)

(20)

(21)

(22)

(23)

Run Times

(24)

Accuracy Rankings

(25)

Conclusion

● Improved accuracy of existing data-stream models.

● Developed two deep learning approaches.

● DBN-h

● DBN-BP