Deep Learning in
Partially-labeled Data Stream
DEEP LEARNING IN DATA STREAMS
● The task of data-stream learning is to learn some classifier h to provide a classification for each ˆyt+1 for each xt+1.
● Classification is often a single class.
● Generalise to multi-task/multi-dimensional case of multiple target variables.
● Prior to to classification the true classification of Xt maybe provided: Yt.
● At time step t data-stream learner uses each pair (Xt, Yt) as a training example to update its model.
● Fully supervised case the try classification is always made available prior to classification.
● Semi supervised/partially-labeled case, only a subset have true labels, so some instances there is no ground truth classification.
Restricted Boltzmann Machines (RBMs)
● 5 input units
● 4 hidden units
● Associated with a wight
Restricted Boltzmann Machines (RBMs)
● Fully unsupervised.
● Used to discover the underlying regularities of training data.
● Two layers if features, a visible later(original feature-attribute space).
● Hidden layer.
● Fully connected between layers.
● Unconnected within layers.
● Each visible units is fully connected to every hidden unit via a weight
● RBM learns the weights.
● Hidden variables can provide a compact representation of the underlying pattern of the input.
Deep Belief Networks (DBNs)
● RBMs stacked on top of each other.
● First RBM takes instances xt from input space and produces output zt(1).
● This continues until L many layers Zt(L).
● Then supervised learning happens when labels are available.
● Two strategies:
○ DBN-h
○ DBN-BP
DBN-h
DBN-BP
● Predicts the labels directly by adding a final layer of equivalent dimension to the label space.
● To make predictions correspond to the label back propagation algorithm os run to fine tune the weights of the network.
● Difference between DBN-h is that h is a linear layer, and the weights of the RBM are updated.
Experiments
● Remove the labels of a certain portion of the instances. To simulate a partially-labeled stream.
● If method only deals with labeled examples they ignore instances with no labels.
● Split each dataset into 20 evenly-sized windows.
● Create a model from the instances in the first window.
● Update incrementally with each remaining instances.
● Gauge the accuracy of the classifier first before updating/
Datasets
● Enron E-mail Subset
○ Set of 1703 emails manually labeled into 53 categories.
○ Time ordered.
● 20 Newsgroups
○ Around 20,000 articles sources from 20 newsgroups.
○ Ordered by date over several months
● Aviation Safety Reports
○ 28,596 instances of aviation safety reports.
○ 22 possibles problems with 2.16 average problems per report.
● Forest Cover Type
○ 7 types of forest cover associated with cells based on 54 attributes.
○ 581,012 instances.
Evaluation
● A classifier for each dataset is created.
● Accuracy is gauged as
● Eron and TMC7 learning k many classifiers.
● 20NG and CType learn single multi-class classifier of k possible classes.
● Due to sparse / absence of multi-labeling.
Eron : HT vs. RBM-HT
● Take k nearest neighbour and hoeffding trees and compare their performance with and without an RBM-transformed feature space.
20NG : HT vs RMB-HT
TMC7 : HT vs RBM-HT
CType : HT vs RB-HT
Enron : knn vs RBM-knn
20NG : kNN vs RBM-kNN
TMC7 : kNN vs RBM-knn
CType : kNN vs RBM-kNN
Run Times
Accuracy Rankings
Conclusion
● Improved accuracy of existing data-stream models.
● Developed two deep learning approaches.
● DBN-h
● DBN-BP