Conclusion - Data pruning

Our results show that in learning of object categories with a lot of label noise it pays oﬀ if we preprocess the data to automatically remove the wrongly labeled or otherwise diﬃcult examples,prior to training,rather than leave the training algorithm to try to ignore them while learning.

Data pruning for learning face category is demonstrated to work well with both algorithms which are sensitive to noise (AdaBoost) and algorithms which have inherent regularization capabilities (SVM).

The algorithm does not deteriorate the performance when no noise is introduced, instead,it removes some examples which are inherently diﬃcult for learning. Data pruning is very helpful and necessary for high levels of noise.

A future plan is to extend the algorithm to work on non-aligned images of the target category which,for visual categories,would require more involved feature and algorithm selection rather than conceptual change in the data pruning ideas.

Chapter 5 Conclusion

In this work we have shown the usefulness of processing the training data prior to learning,in which identiﬁcation and elimination of diﬃcult for learning examples is done.

Conversely to methods for learning in the presence of noise,where general penalties are imposed,data pruning advocates direct elimination of diﬃcult to accommodate examples. For,there might be examples which are so hard for the model that they would inﬂuence the solution in an adversary way,even though the solution is regu- larized. Cases in which data pruning is helpful for algorithms which have inherent capabilities of ignoring noisy examples are shown.

Unlike methods which assume a known noise or data model,inﬁnite amount of training data or simply resort to additional training set to do the pruning we restrict our problem to the given training data and do not model explicitly the noise or the data. To our knowledge this is the only work that considers the problem in this more realistic setting.

With this work we have shown that in learning the quality of the examples is also important for better generalization.

Important future direction is to theoretically quantify example diﬃculty with re- spect to a model and the inﬂuence of troublesome examples in the dataset on the generalization error of a learning model.

The proposed method can be used for acquiring visual category data with very little supervision. For example,images of an object category returned by search engines can be used for training,but they contain a large amount of noise. We demonstrate the usefulness of the approach for very challenging face dataset with large levels of noise. Training on the pruned data demonstrated considerably improved performance compared to training on the full data,especially for large amount of contamination.

An extension of data pruning for object recognition with minimum supervision, namely when the exact object position is not known in the image,would be a very useful and challenging research direction.

Bibliography

[1] Allende,H., ˜Nanculef,R.,Salas,R.,’Robust bootstrapping Neural Networks’, MICAI 2004: 813-822

[2] Angluin,D.,Laird,P.,’Learning from noisy examples’,Machine Learning,2, 343:370,1988

[3] Bartlett,P.,’The sample complexity of pattern classiﬁcation with Neural Net- works: the size of the weights is more important than the size of the size of the network’,IEEE Transactions on Information Theory,44,525-536,1998

[4] Bartlett,P.,Shawe-Taylor,J.,’Generalization performance of Support Vector Machines and other pattern classiﬁers’,in Sch¨olkopf,B.,Burges,C.,Smola,A.

(eds.),’Advances in Kernel Methods: Support Vector Learning’,The MIT Press, 1999

[5] Beckman,R.,Cook,R.,’Outlier...s’,Technometrics,vol. 25,no. 2,p.119-149, 1983

[6] Bishop,C.,’Neural Networks for pattern recognition’,Oxford University Press, 1995

[7] Blake,C. L.,Merz,C. J.,’UCI Repository of machine learning databases’

[http://www.ics.uci.edu/˜mlearn/MLRepository.html]. Irvine,CA: University of California,Department of Information and Computer Science,1998

[8] Blum,A.,Mitchell,T.,’Combining labeled and unlabeled data with co-training’, COLT: Conference on Computational Learning Theory,1998

[9] Breiman,L.,’Bagging predictors’,Machine Learning,24(2),123-140,1996 [10] Breiman,L.,’Randomizing outputs to increase prediction accuracy’,Technical

report 518,University of California,Berkeley,May 1,1998.

[11] Caprile,B.,Furlanello C.,Merler,S.,’Highlighting hard patterns via AdaBoost weights evolution’,Multiple Classiﬁer Systems 2002: 72-80

[12] Christensen,S.,Sinclair,I.,Reed,P.,’Designing committees of models through deliberate weighting of data points’,Journal of Machine Learning Research 4,p.

39-66,2003

[13] Cohn. D.,Ghahramani,Z.,Jordan,M.,’Active learning with statistical models’, Journal of Artiﬁcial Intelligence Research 4,p.129-145,1996

[14] Cortes,C.,Vapnik,V.,’Support vector networks’,Machine Learning,vol. 20, pp. 273–297,1995.

[15] Efron,B.,Tibshirani,R.,’An introduction to the bootstrap’,Chapman and Hall, 1993

[16] Fischler,M.,Bolles,R.,’Random sample consensus: A paradigm for model ﬁtting with applications to image analysis and automated cartography’,CACM, 24(6):381-395,June 1981

[17] Forsyth,D.,Ponce,J.,’Computer vision: a modern approach’,Prentice Hall, 2003

[18] Freund,Y.,Schapire,R.,’A decision-theoretic generalization of on-line learning and an application to boosting’,European Conference on Computational Learning Theory,1995

[19] [http://www.google.com]

[20] Grove,A.,Schuurmans,D.,’Boosting in the limit: Maximizing the margin of learned ensembles’,Fifteenth National Conference on Artiﬁcial Intelligence,1998

[21] Huber,P.,’Robust statistics: A review’,The Annals of Mathematical Statistics, vol. 43,no. 4,p.1041-1067,1972

[22] Jiang,W.,’Is regularization unnecessary for boosting’,Proc. Eighth Interna- tional Workshop on Artiﬁcial Intelligence and Statistics,Morgan Kaufmann, January,2001

[23] Kearns,M.,Li,M.,’Learning in the presence of malicious errors’,In Proceedings of the Twentieth Annual ACM Symposium on Theory of Computing,pages 267- 280,Chicago,Illinois,2-4 May 1988

[24] Koistinen,P.,’Asymptotic theory for regularization: One-dimensional linear case’,In Jordan,M.,Kearns,M.,Solla,S.,editors,Advances in Neural In- formation Processing Systems 10,pages 294-300,1998.

[25] Kulkarni,S.,Mitter,S.,Tsitsiklis,J.,’Active learning using arbitrary binary valued queries’,Machine Learning,Vol. 11,1,23-35,1993

[26] Littlestone,N.,Warmuth,M.,’Relating data compression and learnability’, Technical report,University of California Santa Cruz,1986

[27] Loy,G.,Bartlett,P.,’Generalization and the size of the weights: an experimental study’,Proceedings of the Eighth Australian Conference on Neural Networks, 60-64,1997

[28] Mason L.,Bartlett,P.,Baxter,J.,’Improved generalization through explicit optimization of margins’,Machine Learning,0,1-11,1999

[29] Mason,L.,Baxter,J.,Bartlett,P.,Frean,M.,’Boosting algorithms as gradient descent in function space’,Technical report,Department of Systems Engineering, Australian National University,1999

[30] Merler,S.,Caprile,B.,Furlanello,C.,’Bias-variance control via hard points shaving’,International Journal of Pattern Recognition and Artiﬁcial Intelligence, 2004. In Press.

[31] Murphy,K.,Torralba,A.,Freeman,B.,’Using the forest to see the trees: a graphical model relating features objects and scenes’,Advances in Neural Infor- mation Processing Systems 15,2003

[32] Nicholson,A.,’Generalization Error Estimates and Training Data Valuation’, Ph.D. Thesis,California Institute of Technology,2002

[33] R¨atsch,G.,Onoda,T.,Muller,K,’Regularizing AdaBoost’,Advances in Neural Information Processing Systems 11,p. 564-570,MIT Press,1999

[34] R¨atsch,G.,Onoda,T.,Muller,K,’Soft margins for AdaBoost’,Machine Learn- ing,p. 1-35,Kluwer Academic Publisher,Boston,2000

[35] Schapire,R.,Freund,Y.,Bartlett,P.,Lee,W.,’Boosting the margin: A new explanation for the eﬀectiveness of voting methods. The Annals of Statistics, 26(5):1651-1686,1998

[36] Sch¨olkopf,B.,Smola,A.,’Learning with kernels’,The MIT Press,2002

[37] Schwartz,D.,Samalam,V.,Solla,S.,Denker,J.,’Exhaustive learning’,Neural Computation,2(2):374-385,1990

[38] Shawe-Taylor,J.,Bartlett,P.,Williamson,R.,Anthony,M.,’Structural risk mini- mization over data-dependent hierarchies’,Technical Report NC-TR-96-053,De- partment of Computer Science,Royal Holloway,University of London,Egham, UK,1996

[39] Tikhonov,A.,Arsenin,V.,’Solutions of ill-posed problems’,John Wiley & Sons, Washington D.C.,1977

[40] Vapnik,V.,’The nature of statistical learning theory’,Springer-Verlag,New York,1995

[41] Vidyasagar,M.,’Learning and generalization with applications to Neural Net- works’,Springer-Verlag London Limited,2003

Dalam dokumen Data pruning (Halaman 62-69)