This Report Presented in Partial Fulfillment of the Requirements for the Degree of Bachelor of Science in Electronics and Telecommunication

This Project entitled Feature Extraction of Bangla Vowels and Consonants submitted by Snigdha Islam and Shakhawat Hossan to the Department of Electronics and Telecommunication Engineering, Daffodil International University, has been accepted as satisfactory in partial fulfillment of the requirements for the degree of B.Sc. This thesis deals with the study of Bangla phoneme analysis which is the basis of Bangla speech processing. This document will describe LPC speech signal analysis technique and apply it to Bangla vowels and consonants to extract features.

They have successfully discussed the formant structure of vowels in Bangla with Japanese and American English. Hossain has done a power spectrum analysis of a good number of Bangla vowels and consonants. Talukdar also analyzed the formant frequencies and power spectrum of some Bangla vowels and consonants.

Lutfor Rahman evaluated the first three formant frequencies and bandwidth of all Bangla vowels for different age groups. As the first phase of the study of Bangla speech processing, we selected the Bangla vowel consonants in isolated utterances for analysis.

The Mechanism of Voice [Speech] Production

Introduction
Fundamentals of voice production

Breathing
Phonation
Resonance

Components of the Speech Production System
Chapter 3

Impulses sent by the brain when we intend to speak, however, signal the laryngeal muscles to close the vocal folds. This rapid vibration of the vocal folds produces the sound waves in the air which are the basic tones of our voices. The larynx is located at the top of the trachea and is behind the Adam's apple.

The mucous membrane and Reinke's space are together known as the 'envelope' of the vocal folds. The higher the voice, the faster the rate of vibration of the vocal folds. The more elongated and thinner the edges of the vocal folds become, the higher the pitch will be.

The soft palate can be raised, so that the escape of the air is only possible through the mouth. The soft palate can be raised, so that the escape of the air is only possible through the mouth.

Acoustic Phonetics Classification of Speech Signal

Introduction
Phonemes
Models for Speech Production

Modeling of the Speech Production System
Model based upon the acoustic theory (Source -Filter Model)

How speech can be modeled as a source signal passing through a filter .1 The Make-Up of Speech

Signal Processing Considerations
Signal Processing Representation of the Source Filter Model
Properties of Vowel Sounds
Source-Filter Model
Vowel Source
Vowel Filter

Fundamental Properties of Speech Signal

Variation in Frequencies of Source or Pitch
Variation in Sound Quality or Formants

Articulatory Phonetics

One of the simple resonators forms the complex resonance system of the vocal tract. A useful analytical model of how speech sounds are produced, which emphasizes the independence of the sound source in the vocal tract from the filter that shapes that sound. The source filter model of vowel production states that the frequency content of a vowel can be explained by considering how the spectrum of sound generated by the larynx is filtered by the vocal tract system.

The source filter model also helps quantify vowels, as we can separately measure the contribution of the source and the filter to the final vowel sound. The primary characteristic of the source is its fundamental frequency, while the primary characteristic of the filter can be reduced to the location in frequency of the vocal tract's resonances or formants, see Figure 3. The source filter model also helps to qualify vowels, as we can measure separately the contribution of the source and the filter to the final vocal sound .

The main characteristics of the source are its fundamental frequency, while the main characteristics of the filter can be reduced to the location at the resonance frequency or formants of the vocal tract, see figure 3. The samples of a speech signal are assumed to be the output of a linear system that varies in time. From the equation: An input x (t) to a filter with impulse response h (t) gives the convolution of the two.

For vocal sounds, the sound source is the regular vibration of the vocal folds in the larynx, and the filter is the entire vocal cord between the larynx and the lips. The frequency response of the vocal tract filter for vowels shows a small number of resonant peaks called formants. Pitch is the result of changes in vibration or frequency changes of the source (glottis).

In the context of speech production, the resonance frequencies of the vocal tract tube are called formant or simply formant frequencies. Formant frequencies depend on the shape and dimensions of the vocal tract; each shape is characterized by a set of formant frequencies. Thus, the spectral properties of the speech signal change with time as the vocal tract changes.

To produce an unvoiced sound, the vocal tract is excited by random white noise, and the shape of the vocal tract uniquely determines the sound that is produced. The air in the vocal tract vibrates at three or four frequencies, regardless of the fundamental frequency, which are the frequencies determined by the shape of the vocal tract and the speed of vibration.

Mathematical Tools Used

Discrete Time Signal
The Fourier Transforms

The Discrete Fourier Transform (DFT)
Fast Fourier Transform (FFT)

Speech Materials
Linear Predictive Coding (LPC)
LPC Model
Chapter 6

An ideal theoretical sampler produces samples equivalent to the instantaneous value of the continuous signal at the desired points. For time-varying functions, let s(t) be a continuous function (or "signal") to be sampled, and let the sampling be performed by measuring the value of the continuous function every T seconds, called the sampling interval . Mathematically, the modulated Dirac comb is equivalent to the product of the comb function with s (t).

Each value of the function is usually expressed as a complex number (called complex amplitude) that can be interpreted as a magnitude and a phase component. When sampling a time-domain function for ease of storage or computing, it is also still possible to recreate a version of the original Fourier transform according to the Poisson. The second image shows the plot of the real and imaginary parts of this function.

The result is that when you integrate the real part of the integrand, you get a relatively large number (0.5 in this case). It transforms one function into another, which is called the frequency domain representation, or simply the DFT, of the original function (which is often a time domain function). Speech analysis techniques provide a brief scientific overview of the speech signal analysis techniques involved, with a particular focus on variable resolution spectral analysis, i.e. emphasis, variable resolution spectral analysis, filter bank analysis (Filter Bank Speech Analysis), linear predictive analysis (Linear Speech Prediction Analysis), LPC analysis, deltas and normalization (delta acceleration and feature normalization).

Our goal in speech signal processing is to obtain a more appropriate or useful representation of the information conveyed by the speech signal. Since the parameter used in most speech processing applications derived from the frequency domain representation, the main task is to calculate the speech spectrum. First, the acoustic analysis of the vocal mechanism shows that the concept of normal or natural frequency allows the concise description of speech sounds.

Linear predictive coding (LPC) is defined as a digital method of encoding an analog signal in which a certain value is predicted by a linear function of the previous values of the signal. The linear predictive coding (LPC) model is based on a mathematical approximation of the vocal tract represented by this tube of varying diameter. At a certain time, t, the speech sample s(t) is represented as a linear sum of the p preceding samples.

The most important aspect of LPC is the linear predictive filter, which allows the value of the next sample to be determined by a linear combination of previous samples. Excitation is the type of sound sent into the filter or vocal tract, and articulation is the transformation of the excitation signal into speech.

Fig 6: Discrete time signal 4.2 Sampling Theorem

Result, Discussion and Conclusion

Discussion

Speech processing is the extraction of speech parameters that shape the speech signal for suitable representation. The ultimate goal of studying Bangla vowels and consonants is to provide a complete computer speech processing based on Bangla. In this paper, Bangla vowels and consonants in isolated pronunciation are analyzed using a speech analysis method and the parameters obtained from the method are compared.

Pitch: pitch obtained by the LPC has acceptable comparable value for each vowels and consonants. The variation in result may be due to order of LPC, suppression of formants that are close together; LPC has a greater flexibility of formant extraction. The result obtained from the study can be improved by recording voice in a noise-free environment.

This work can be extended to future research in other phoneme analysis or other speech unit analysis.

Conclusion