ITSI Transactions on Electrical and Electronics Engineering (ITSI-TEEE)
________________________________________________________________________________________________
________________________________________________________________________________________________
ISSN (PRINT) : 2320 – 8945, Volume -3, Issue -1, 2015 16
“Gender Based Acoustic Features and Spectrogram Analysis for Kannada Phonetics”
1Shiva Prasad.K.M, 2Anil Kumar.C, 3M.B.Manjunatha, 4KodandaRamaiah.G.N
1,2ResearchScholar, Jain University, B’lore
3PRINCIPAL ,AIT Tumkur.
4Professor & HOD Dept of ECE, Kuppam Engineering College, Kuppam., Email: 1[email protected].
Abstract- Human speech has hierarchical structure to have meaningful information. Speech is not a sequence of steady state sounds, abruptly changing from one to other. It consists of sentences, sentences can be divided in to words, words are constituted by phonemes which are the basic voice construction elements. . We use language every day without devoting much thought to the process. A study of acoustic characteristics of any language begins with the phonetic Analysis of that language. Speech is one of the most information-laid signals, speech sounds have a rich and multi-layered temporal variation that convey emotions, words, intention, expressions, intonation, accent, speaker identity, gender, age, style of speaking, state of health of speaker. The major concern in any language will be the identification, separation and processing of vowels and consonants of that language. In our study for analysis we are using the pre-recorded samples of various subjects differing in age and gender .the analysis is carried out with respect to basic acoustic features and spectrogram, to extract formant frequency, pitch and intensity by using open source tool- PRAAT.
Key words: Formant frequency, Intensity, Phonetics, Pitch, Pratt and Spectrogram.
I. INTRODUCTION.
Speech is the most convenient way of human communication. Speech is not only sequence of steady state of some sounds abruptly changing from one to other or just some signals can be ignored after hearing, but Speech is a unique signal that conveys linguistic and non-linguistic information and also Speech conveys message of multiple levels of knowledge sources.
Speech signals are examples of information bearing signals that evolve as functions of a single independent variable like time. Speech is not only informative signal but also an complex wave as acoustic output of speaker’s effort. Speech serves to communicate from speaker to one or more listeners. The typical sound produced when a phoneme is articulated and is called phone. Most of the linguistic languages have 20-40 phonemes which provides an alphabet of sounds to describe the different words in the language. Words are composed of phoneme sequences called syllables.
Speech analysis also referred as feature extraction of speech. [1]
Speech sounds are sensations of air pressure vibrations produced by air exhaled from the lungs and modulated
and shaped by the vibrations of the glottal cords and the resonance of the vocal tract as the air is pushed out through the lips and nose.
Feature extraction refers to, conversion of the speech wave form into some type of parametric representation for further speech processing for various applications.
The parameters obtained by analysis tools are important acoustic cues. Speech analysis helps in reducing the raw speech data base (speech corpus) to manageable quantity and to extract information from the raw data, which are crucial in interpreting the speech signal. The detailed analysis of speech features and their relationship to human perception is a challenging task in speech processing.[3]
Speech is an immensely information-rich signal exploiting frequency-modulated, amplitude- Modulated and time-modulated carriers (e.g. resonance movements, harmonics and noise, pitch intonation, power, duration) to convey information about words, speaker identity, accent, expression, style of speech and emotion. All this information is conveyed primarily within the traditional telephone bandwidth of 4 kHz. The speech energy above 4 kHz mostly conveys audio quality and sensation.
1.2. ABOUT KANNADA LANGUAGE.
It is the official language of southern part of India, specifically associated to Karnataka state. The kannada speech consists of Vowels(swaras), Consonants(vyanjanas) & yogavahakas (part vowel – part consonants). The Vowels group are united of 14 various characters. The Consonants are of total 36 numbers which are broadly classified into Structured And Unstructured, the Structured Consonants are of 5 different categories namely velars, palatals, Retroflex,dental and labials five in each cateogory forming 25 in total. The Unstructured Consonants are of 11 in number. The Yogavahakas are combination of part vowel and part consonents forming 2 in number. Toally the kannada language comprises of 52 letters named as kannadavarnamala/aksharamala.
II. SCOPE OF THE WORK.
Any language can be described in terms of set of distinctive sounds called phonetics. Phonetics is the study of speech sounds, phonemes are symbols to show how the word pronounces. Phonemes are more refined
ITSI Transactions on Electrical and Electronics Engineering (ITSI-TEEE)
________________________________________________________________________________________________
________________________________________________________________________________________________
ISSN (PRINT) : 2320 – 8945, Volume -3, Issue -1, 2015 17
speech sounds, the study of phonemes called Phonemics, the study of sound variations is called phonetics.
Phonetics are of three types, articulatory, acoustic and auditory phonetics. Acoustic is the study of sound waves made by human vocal organs..it studies Physical properties of sounds and provide a language to distinguish from One sound to other in quality and quantity. A study of acoustic characteristics of any language begins with the phonetic Analysis of that language. Broad classification of sounds are vowels and consonants. Vowels are those which allow unrestricted air flow in the vocal tract,consonants are those which restrict air flow at some point, consonants have weaker intensity than vowels.[6]
The major concern in any language will be the processing of vowels (V), consonants(C) and there combinations like VCV,CVC, CCVC words and sentences. The speech is in random nature generated due to excitation of so many organs in the human body which consists of complex and simple resonant frequencies called Formants. The essentiality of measuring formant frequencies gives an idea of finding out utterances in speech and also the voice quality. We are focusing on the spectrogram technique, we are analysing & extracting the formant frequencies, intensity
& pitch from the speech samples recorded using GOLDWAVE software and analysed using PRAAT spectrogram technique.[5]
2.2. OBJECTIVE OF THE STUDY.
In order to generate the spectrogram for different vowels of kannada language, of various age group of male and female speaker’s. We are using the PRAAT software for analyzing the formant frequencies, intensity and pitch for each speech sample by plotting the spectrogram of each subject for each speech sample.
2.3 DATABASE CREATION
Test samples considered here are listed below for male and female subjects like Sp1,Sp2, Sp3& Sp4 (Sp1 & Sp2 are male speakers, Sp3& Sp4 are female speakers) and uttered speech samples of kannada Vowels (a - au).
2.4. PROBLEM STATEMENT.
The kannda speech consists of words comprising vowels, yogavahaks and consonants which may be either voiced or unvoiced sounds. The problem considered here is by taking different vowels &
yogavahakas to be analysed with respect to fundamental acoustic features like formant frequencies, intensity and pitch.
Fundamental frequency:
They are normal resonant frequency depends on position and manner of articulation. Generally 3 formants are important but higher formants are necessary for acceptable speech quality. The formant frequencies are appeared as dark bands in spectrogram. The resonating frequencies are decided by the mode of excitation in the
vocal cavity. The formants are useful for speech recognition,speaker verification.
2.5. HUMAN-SPEECH PRODUCTION
MECHANISM.
An outline of the anatomy of the human speech production system is shown in fig.1. It is divided into three major regions namely: larynx, vocal tract and respiratory system. The combined voice production mechanism produces thevariety of vibrations and spectral-temporal compositions that form different speech sounds. The act of production of speech begins with exhaling (inhaled) air from the lung. Without the subsequent modulations, this air will sound like a random noise with no information. The information is first modulated onto the passing air by the manner and the frequency of closing and opening of the glottal folds.
Speech is a convolved signal linear filterification is not suited for the analysis of convolved signals like speech.[5]
Signal processing front end for extracting the feature set is an important stage in any speech processing applications like speech analysis, speech synthesis
Fig 1.Anatomy of speech production.
III. BASIC ACOUSTIC FEATURES:
In human speech , Pitch, formant frequency and intensity forms the basic features. The brief description is as follows:
Pitch: It is also known as repetition rate of opening and closing of vocal folds, it is the fundamental frequency of vibration of the vocal folds known as fundamental frequency(f0). Pitch is used for classification of Voiced/unvoiced speech. Pitch for voiced component is low and for unvoiced is absent. Commonly used Pitch measuring method is auto correlation.[2]
Intensity: It is an acoustic feature which models the energy(loudness) of sound simulating the way it is perceived by the human ear by calculating the sound amplitudes in different intervals usually expressed in decibels (dB).[2]
Spectrogram: A spectrogram converts a two dimensional waveform (amp v/s time) into a three dimensional pattern (amp v/s frequency v/s time).the spectrogram are broadly classified into two major types namely wideband and narrow band spectrogram. Wideband
ITSI Transactions on Electrical and Electronics Engineering (ITSI-TEEE)
________________________________________________________________________________________________
________________________________________________________________________________________________
ISSN (PRINT) : 2320 – 8945, Volume -3, Issue -1, 2015 18
spectrogram is the basic tool for spectral analysis. Peaks in spectrum appear as dark horizontalbands called formant resonances. Voiced sounds cause vertical marks in spectrogram due to increase in speech amplitude each time the vocal folds close. Spectrograms are used primarily to examine formant canter frequencies. Wide band spectrograms employs 300 Hz band pass filters with response time of a msec, which yield good time resolution (accurate durational measurement).[2] [3]
3.2ABOUT PRATT:
We have used pratt as our analysis platform because
it is a public domain,it supports on variety of platforms namely window, macintosh, linux and Solaris.
it provides variety of value of datastructures like text grid, pitch tier and table to represent the various types of information used to extracting the prosodic features.
we can use pratt commands as programming language.
it can be used as a suite of high quality speech analysis routines like pitch tracking.
3.3 APPLICATIONS.
Language processing: Analysing the gender voice.
Speech Recognition & understanding: speech refers to the recognition of each word in speech, whereas speech understanding means extracting the meaning of what is said from the words recognized.
Home automation: automatic conversion of electronic devices can be controlled through speech.
Robotics: Human machine interface will become very easy in robotic application.
Human speech recognition: It indicates that human do not appear to use spectral templates but they do partial recognition of phonetic units independently in different frequency bands.
IV. RESULTS.
The table 1 & 2 shows the tabulated results for the various samples considered and the fig 2, fig 3, fig 4 and fig5 shows the spectrogram of the kannada letter “ಅ” for the four different speakers of different age as mentioned.
Table1: tabulated results of male speakers of different age.
Table 2: tabulated results of fe-male speakers of different age as shown above.
(In table 1 & table 2. F1,F2, F3 and Pitch are expressed in Hertz, intensity expressed in decibel(dB))
Fig 2: spectrogram of female speaker of age 24 year for the sample “ಅ”
Fig 3: spectrogram of female speaker of Age 18 years for the sample “ಅ”
Fig 2: spectrogram of male speaker of age 15 year for the sample “ಅ”
ITSI Transactions on Electrical and Electronics Engineering (ITSI-TEEE)
________________________________________________________________________________________________
________________________________________________________________________________________________
ISSN (PRINT) : 2320 – 8945, Volume -3, Issue -1, 2015 19
Fig 3: spectrogram of female speaker of Age 34 years for the sample “ಅ”
V. CONCLUSION:
The basic acoustic feature analysis helps in identification of phonems like vowels and consonents in any language. Which can be used as the tool for identification of resonant frequencies for different gender and age.the resonant peaks are found to be different for each subject due to the parameters affecting the speech production mechanism. This can be described by the study of spectrogram for different phonems. The
spectrograms are plotted for various phoneme utterences, and basic acoustic parameters are extracted and tabulated. The pitch and intensity variation from speaker to speaker is identical and the formant frequencies are found to be one of the deciding factor or decision maker with respect to the basic acoustic parameters of the human speech.
REFERENCES:
[1] L. Rabiner and B. Juang, Fundamentals of speech recognition.Prentice Hall, 1993.
[2] Fundamentals of speech recognition by L.Rabiner, B juang, Yegnanarayana/ I Edition.
[3] L. Welling and H. Ney, "A Model for Efficient Formant Estimation", Proc. IEEE ICASSP, pp.
797-800,Atlanta, 1996.
[4] Shaila D Apte – “Speech & Audio Processing”.;
First Edition, Wiley India Pvt Ltd., 2013.
[5] Thomas F Quatieri – “Discrete Time Speech Processing Principles & Practices”., First Edition, Pearson Education Ltd., 2011.
[6] Douglas O Shaughnessy – “Speech Communication- Human &Machine ”., Second Edition, University Press (INDIA) Ltd., 2009.