• Tidak ada hasil yang ditemukan

 Phonemes are the basic unit of any language. If we can separate Bangla phoneme or syllable from speech then performance of speech recognition is increased and space requirement is also less.

 We have showed an approach to segment speech into word level. By using relative pitch information, relative energy value syllable segmentation can be done. Segmentation of Bangla speech corpora into sentence level can also be done by using prosodic feature.

 Feature of one word may be matched with other word so feature extraction is important. We employed MFCC for feature extraction but delta- delta MFCC coefficient and cepstrum can also be applied for feature extraction.

 Neural network can also be used for word separation and their accuracy will be higher than our approach as NN use knowledge. Hidden Markov model can be used as speech recognizer.

 A common Bangla database consists of a number of male and female speaker with different age group of different region of Bangladesh is necessary to build to compare recognition performance.

References

[1] Norton P., Peter Norton’s Introduction to Computers, McGraw-Hill, International Edition, 2001.

[2] Flanagan J.L., Coker C.H., Rabiner L.R., R.W. Schafer, N. Umeda, Synthetic voices for computers, IEEE Spectrum, pp. 22-45, vol. 7, no. 10, October 1970.

[3] http://en.wikipedia.org/wiki/speech_perception.

[4] Pelton, G. E., Voice Processing, McGraw-Hill, International Edition, 1993.

[5] Karim, A. H. M. R., Rahman, M. S., Iqbal, M. Z., Recognition of Spoken Letters in Bangla, Proceedings of the 5th International Conference on Computer and Information Technology (ICCIT), pp. 213-216, East West University, Dhaka, Bangladesh, 27-28 December, 2002.

[6] Yuk D., Robust Speech Recognition Using Neural Networks and Hidden Markov Models-Adaptations Using Non-Linear Transformations, Ph. D dissertation, The State University of New Jersey, pp. 2-4, New Brunswick,

New Jersey, October, 1999.

[7] Mehta M. G., Speech Recognition System, M. Sc Thesis, Texas Tech University, pp.8-9, August, 1996.

[8] Ito M. R., Relationship between Zero crossing measurements for speech analysis and recognition, The Journal of Acoustical Society of America, pp 2061-2062, vol. 51, issue. 6B, 1982.

[9] http://en.wikipedia.org/wiki/Dynamic_time_warping. [10] http://en.wikipedia.org/wiki/Vector_quantization. [11] http://en.wikipedia.org/wiki/HMM.

[12] Hasnat M. A., Mowla J. , Khan M., Isolated and Continuous Bangla Speech Recognition: Implementation, Performance and application perspective, Proceedings of the seventh International Symposium on Natural Language Processing, SNLP 2007, 13-15 December, Pattaya, Thailand.

[13] Juang B. H., On the Hidden Markov Model and Dynamic Time Warping for Speech Recognition – A Unified View, AT and T Technical Journal, pp.

1213-1243, vol.63, no.7 , 1984.

[14] Simon Haykin, Neural Networks, A comprehensive Foundation, Pearson Education, Singapore, 2004.

[15] Yamashita Y., Iwahashi D., Mizoguchi R. Keyword Spotting Using F0 Contour Information, Systems and Computers in Japan, pp 52-61, vol. 32, no. 7, 2001.

[16] Waibel A., Hanazawa T., Hinton G., Shikano K., Lang K. J., Phoneme recognition using time delay neural networks, IEEE Transactions on Acoustics, Speech, and Signal Processing, pp. 328-339, vol. 37 , no. 3, March 1989.

[17] http://en.wikipedia.org/wiki/Bengali_language.

[18] Khan M. F., Debnath R. C., Comparative study of Feature extraction methods for bangla phoneme recognition, Proceeding of the 5th International Conference on Computer and Information Technology (ICCIT), pp. 257-261, East West University, 27-28 December, 2002.

[19] Press W. H., Teukolsky S. A., Vellerling W. T. and Flannery B. P., Numerical Recipes in C, Cambridge University Press, 1992.

[20] Deller J. R., Hansen J. H.L., Proakis J. G., Discrete-Time Processing of Speech Signals, IEEE Press, New York, 2000.

[21] Campbell J. P. Jr., Speaker recognition: a tutorial, Proceedings of the IEEE, pp. 1437–1462, vol. 85, no. 9, 1997.

[22] Hasan M. R., Jamil M., Rabbani M., Rahman M. S., Speaker identification using mel frequency cepstral coefficients, Proceedings of the 3rd International Conference on Electrical and Computer Engineering (ICECE), pp. 565-568, Dhaka, Bangladesh, 28-30 December,2004.

28-30 December, 2004.

[23] http://en.wikipedia.org/wiki/Pitch_detection_algorithm.

[24] http://www.phon.ucl.ac.uk/courses/spsci/matlab/lect9.html.

[25] Middleton G., Pitch Detection Algorithms [Connexions Web site].

December 17, 2003. Available at: http://cnx.org/content/m11714/1.2/.

[26] Tebelskis J., Speech recognition using neural networks, Ph. D dissertation, Carnegie Mellon University, Pittsburgh, Pennsylvania, pp-2-2,1995.

[27] Huang X., Acero A., Hon H, Spoken Language Processing, Prentice Hall 2001.

[28] Jensen, M. K., Nielsen, S. S., Speech Reconstruction from Binary Masked Spectrograms Using Vector Quantized Speaker Models, Informatics and Mathematical Modeling, Technical University of Denmark, DTU, 2006.

[29] Räsänen O. J., Laine U. K., Altosaar T., Self-learning Vector Quantization for Pattern Discovery from Speech, Proceedings of the 10th Annual Conference of the International Speech Communication Association, pp. 852-855, Brighton, UK, 6-10 September, 2009.

[30] http://www.csit.fsu.edu/~burkardt/f_src/kmeans/test01_clusters.png.

[31] http://en.wikipedia.org/wiki/Euclidian_distance.

[32] Cutler, A., Norris, D. G., The role of strong syllables in segmentation for lexical access, Journal of Experimental Psychology: Human Perception &

Performance, pp 113-121, 1988.

[33] Bagshaw, P. C., Williams, B. J., Criteria for labeling prosodic aspects of English speech, In Proceedings of International Conference on Spoken Language Processing, pp. 859-862, Alberta, Canada, 1992.

[34] Cairns, P., Shillcock R., Chater, P., Joe, L., Bootstrapping Word Boundaries: A Bottom-up Corpus-Based Approach to Speech Segmentation, Journal of Cognitive Psychology, pp 111-153, vol. 33, 1997.

[35] Wang, C., Seneff, S., Lexical Stress Modeling for Improved Speech Recognition of Spontaneous Telephone Speech in the JUPITER Domain, In Proceedings of the 7th European Conference on Speech Communication and Technology, pp 2761-2764, Aalborg, Denmark, September 2001.

[36] Tamburini F., Caini, C., An Automatic System for Detecting Prosodic Prominence in American English Continuous Speech, International Journal of Speech Technology, pp 33–44, 2005.

[37] http://www.isca-speech.org/archive.

[38] Butterfield, S., Clutter, A., Intonational cues to word boundaries in clear speech? Proceeding of the institute of Acoustics, 12, part-10, pp 87-94.

[39] Butterfield, S., Clutter, A., Word boundary cues in clear speech: A supplementary report, Technology of Adaptive Aging, The National Academies Press, Washington, D.C., 2004.

[40] Werner, S., Keller, E., Fundamentals of Speech Synthesis and Speech Recognition: Basic Concepts, State of the Art, and Future Challenges, Wiley Blackwell, 1994.

[41] http://en.wikipedia.org/wiki/Speech_recognition.

[42] Xie, H., Andreae, P., Zhang, M., Warren, P., Learning Models for English Speech Recognition, In Proceedings of the Twenty-Seventh Australasian Computer Science Conference (ACSC2004), pp 323-329, Dunedin, New Zealand, 2004.

[43] http://www.learnartificialneuralnetworks.com/speechrecognition.html.

[44] Abdulla W. H., Kasabov N. K., The Concepts of Hidden Markov Model in Speech Recognition Technical Report, Department of Knowledge Engineering Lab, Information Science Department, University of Otago, New Zealand, 1999.

[45] Rabiner L.R., A Tutorial on Hidden Markov Models and Selected Applications in Speech recognition, In Proceedings of the IEEE, pp 257- 286, vol. 77, February 1989.

[46] Mehler, J., Dommergues, J., Frauenfelder, U., Segui, J., The syllable's role in speech segmentation, Journal of Verbal Learning and Verbal Behavior, pp 298-305, vol. 20, June 1981.

[47] Otake, T., Hatano, G., Cutler, A. Mehler, J., Mora or syllable, Speech segmentation in Japanese, Journal of Memory and Language, pp 258-278, vol-32, April 1993.

[48] http://en.wikipedia.org/wiki/Bengali_phonology.

[49] Mottalib, M. A., A Review on Computer Based Bangla Processing, Proceedings of the National Conference on Computer processing of Bangla

(NCCPB), pp. 72-81, Independent University, Dhaka, Bangladesh, 27 February, 2004.

[50] Hossain S. A., Rahman M. L., Ahmed M. F., Automatic Segmentation of Bangla Speech: A New Approach, Asian Journal of Information Technology, pp. 1127-1130, 2005.

[51] Hossain S. A., Nahid N., Khan N. N., Gomes D. C., Mugab S. M., Automatic Silence/Unvoiced/Voiced classification of Bangla Velar Phonemes: New approach, Proceedings of the 8th International Conference on Computer and Information Technology (ICCIT), Islamic University of Technology (IUT), Dhaka, 28-30 December, 2005.

[52] Hassan M. R., Nath B. , Bhuiyan M. A., Bengali Phoneme Recognition- A New Approach, Proceedings of the 6th International Conference on Computer and Information Technology (ICCIT), pp. 365-369, Jahangirnagor University, Dhaka, Bangladesh, 19-21 December, 2003.

[53] Islam, M. R., Sohail, A. S. M., Sadid, M. W. H., Mottalib, M. A., Bangla Speech Recognition using three layer Back-propagation neural network, Proceedings of National Conference on Computer processing of Bangla (NCCPB), pp. 44-48, Independent University, Dhaka, Bangladesh, 18 February, 2005.

[54] Roy K., Das D. , Ali Dr. M. G., Development of the speech recognition system using artificial neural network, Proceeding of the 5th International Conference on Computer and Information Technology (ICCIT), pp.118- 122, East West University, 27-28 December, 2002.

[55] Rahman, K. J., Hossain, M. A., Das, D., Islam, A. Z. M. T., Ali, Dr. M.

G., Continuous Bangla Speech Recognition System, Proceedings of the 6th International Conference on Computer and Information Technology (ICCIT), pp. 303-307, Jahangirnagor University, Dhaka, Bangladesh, 19-

Dokumen terkait