Design And Development Of Voice Transformation.

(1)

DESIGN AND DEVELOPMENT OF VOICE TRANSFORMATION

LILY LING AI LING

This report is submitted in partial fulfillment of the requirements for the award of Bachelor of Electronic Engineering (Computer Engineering) With Honours

Faculty of Electronic Engineering and Computer Engineering Universiti Teknikal Malaysia Melaka

(2)

DESIGN AND DEVELOPMENT OF VOICE TRANSFORMATION

Sesi

Pengajian : …..2008/2009………

Saya ………LILY LING AI LING……….. (HURUF BESAR)

mengaku membenarkan Laporan Projek Sarjana Muda ini disimpan di Perpustakaan dengan syarat-syarat kegunaan seperti berikut:

1. Laporan adalah hakmilik Universiti Teknikal Malaysia Melaka.

2. Perpustakaan dibenarkan membuat salinan untuk tujuan pengajian sahaja.

3. Perpustakaan dibenarkan membuat salinan laporan ini sebagai bahan pertukaran antara institusi

pengajian tinggi.

4. Sila tandakan ( ) :

SULIT*

(Mengandungi maklumat yang berdarjah keselamatan atau kepentingan Malaysia seperti yang termaktub di dalam AKTA RAHSIA RASMI 1972)

TERHAD* (Mengandungi maklumat terhad yang telah ditentukan oleh

organisasi/badan di mana penyelidikan dijalankan)

TIDAK TERHAD

Disahkan oleh:

__________________________ ___________________________________

(TANDATANGAN PENULIS) (COP DAN TANDATANGAN

PENYELIA)

Alamat Tetap: ………...

(3)

“I hereby declare that this report is the result of my own work except for quotes as cited in the references”

Signature : ……… Author : Lily Ling Ai Ling

(4)

“I hereby declare that I have read this report and in my opinion this report is sufficient in terms of the scope and quality for the award of Bachelor of Electronic Engineering

(Computer Engineering) With Honours.”

(5)

Dedicated to my beloved family member especially my father, mother and also to my

(6)

ACKNOWLEDGEMENT

First of all, I would like to thank to my supervisor, Madam Juwita binti Mohd Sultan for her valuable guidance in completing the project and thesis. I am especially grateful to my beloved father, mother and my family member for all their esteem support, patience and understanding regarding to my study load and research work.

I would like to acknowledge the contributions of my classmate in Universiti Teknikal Malaysia Melaka, for their great efforts in successful completion of this project, which was, otherwise, not possible without their priceless support and help.

(7)

ABSTRACT

(8)

ABSTRAK

(9)

CHAPTER 2 LITERATURE REVIEW PAGE

2.1 Introduction of Voice Transformation 5

2.2 Speech Model 6

2.3 Speaker Characteristics 7 2.4 Component of Voice Conversion System 8 2.4.1 Feature Extraction 8 2.4.2 Model Estimation 9

2.4.3 Voice Mapping 10

(11)

2.6.1 Computing Transformation 13 Parameters

2.6.2 Unvoiced Section Transformation 13 2.7 Intonation Transformation 13 2.8 Sample Rate Conversion 14 2.9 Pitch and Frequency 15

2.9.1 Pitch Range 16

2.10 Pitch Synchronous Overlap Add (PSOLA) 16 2.11 Virtual Dubbing Process 17 2.11.1 Advantage of Virtual Dubbing 19 2.12 Application of Voice Transformation 19 2.12.1 Text to Speech Adaptation 19 2.12.2 Speaker Identification System 20

2.13 Matlab 20

2.13.1 History of Matlab 20 2.13.2 Rules on Variable and Function 21

Names

2.13.3 Graphics 22

(12)

CHAPTER 3 METHODOLOGY PAGE

3.1 Introduction 29

3.2 Project methodology 29 3.2.1 Collect information 31 3.2.2 Understand basic of voice 31

Transformation

3.2.3 Design source code 31 3.2.4 Testing the program 32 3.3 Monitoring program flow chart 32

3.4 Software 34

3.4.1 Matlab 34

3.5 Voice Analysis and Voice Mapping 35

CHAPTER 4 RESULTS AND ANALYSIS PAGE

4.1 Introduction 36

4.2 Results 37

(13)

CHAPTER 5 CONCLUSION AND SUGGESTION PAGE

5.1 Introduction 51

5.2 Conclusion and Recommendation 51

REFERENCES 53

APPENDIX PAGE

Appendix A: Source for the main program 55 (Voice Transformation System)

(14)

LIST OF FIGURE

No. TITLE PAGE

2.1 Human vocal tract 6

2.2 TD-PSOLA Transformation of pitch, intonation and duration 16 Parameters

2.3 Virtual Dubbing Block Diagrams 18

2.4 Example of comment with comment symbol 25 2.5 Example of using Matlab editor to select group of line 25 26 Example of comment out part of statement 26 2.7 Comment out text within a multiline statement 26

3.1 Flow chart of project 30

3.2 Flow chart of program 33

4.1 Blank GUI (default) 37

4.2 GUI Window 38

4.3 Property Inspector for Voice Transformation System 39

(15)

4.5 Output GUI for the Voice Transformation System 41

4.6 GUI when “Load Voice” was clicked 42

4.7 Output when either one of the option was clicked 43 4.8 Signal waveform of the user before and after the transformation 44

(Cartoon Voice for 5 seconds)

4.9 Signal waveform of the user before and after the transformation 44 (Cartoon Voice for 15 seconds)

4.10 Signal waveform of the user before and after the transformation 45 (Man to Woman Voice for 5 seconds)

4.11 Signal waveform of the user before and after the transformation 45 (Man to Woman Voice for 15 seconds)

4.12 Signal waveform of the user before and after the transformation 46 (Woman to Man Voice for 5 seconds)

4.13 Signal waveform of the user before and after the transformation 46 (Woman to Man Voice for 15 seconds)

4.14 The wav file that had prerecorded and save in the file. 47 4.15 Signal waveform of the user before and after the transformation 47

(Load the file to transform the voice to cartoon voice)

4.16 Signal waveform of the user before and after the transformation 48 (Load the file to transform the voice from woman to man voice) 4.17 Signal waveform of the user before and after the transformation 49

(16)

LIST OF TABLE

No. TITLE PAGE

(17)

LIST OF SHORT FORM

DSP- Digital Signal Processing DFT Discrete Fourier Transform DTW- Dynamic Time Warping EM Expected Maximization FFT- Fast Fourier Transform FIR Finite Impulse Response GMM Gaussian Mixture Model HMM- Hidden Markov Modeling HNM Harmonic plus Noise Model IIR Infinite Impulse Response LPC- Linear Prediction Coding

(18)

CHAPTER 1

INTRODUCTION

1.1 Introduction of Project

Speech is the most used way of communication for people. We born with the skills of speaking learn it easily during our early childhood and mostly communicate with each other with speech throughout our lives. By the developments of communication technologies in the last era, speech starts to be an important interface for many systems. Instead of using complex different interfaces, speech is easier to communicate with computers.

(19)

1.2 Objectives of Project

There are several objectives for this project.

To design and develop the algorithm for a high quality voice transformation system.

To analyze the result of the signal after transformation.

1.3 Problem Statement

Nowadays, voice transformation technology has been used more and more widely in many fields. For example, in virtual dubbing process, text to speech program and so on.

There are also other factors which can affect the quality of voice samples other than the noise disruptions created by microphones devices. For example, factors such as mispronounced verbal phrases, different media used for enrollment and verification (using a land line telephone for the enrollment process, but then using a cell phone for the verification process), as well as the emotional and physical conditions of the individual.

1.4 Scope

(20)

User can choose to transform their voice to two choices and record for 5 or 15 seconds. In this project, there is only mainly discussed in the algorithm of the system and therefore will not include the hardware design.

1.5 Methodology

At first, after the title of the project was confirmed, the research about the topic was done by find the important information from journal, reference book and internet resource. The features of Matlab and basic concept of voice transformation was studied.

After that, the graphical user interface (GUI) and source code was designed in Matlab. Program was checked and the troubleshooting was done if any errors had occurred within the program.

The project was completed and successful if there is no error.

1.6 Thesis Outline

This thesis is a report that delivers the idea generated, concepts applied, activities done and the final year project produced. It consists of five chapters which are Chapter 1: Introduction, Chapter 2: Literature Review, Chapter 3: Methodology, Chapter 4: Results and Discussion and finally last chapter, Chapter 5: Conclusion and Recommendation.

(21)

Chapter 2 is discussing about the literature review of this project. The features of Matlab are studied. The application of the voice transformation system was also learned in this chapter.

Chapter 3 briefly described the method that used in this project in order to solve the problem. It also covered the factor and reason that we consider when we choosing the certain method. The advantage of the method was discussing in this chapter too.

Chapter 4 is deals with the analysis of the result at the final stage which is complete designed and implemented the voice transformation in Matlab. The monitoring source code is written by using the Matlab language.

(22)

CHAPTER 2

LITERATURE REVIEW

2.1 Introduction

Definition of voice conversion aims at transforming the characteristics of the speech signal uttered by a speaker (Source Speaker), in such a way that a human listener could believe that the transformed speech is produced by another specific speaker (Target Speaker).

(23)

2.2 Speech Model

[image:23.595.149.490.324.660.2]

The human voice consists of sound made by a human being using the vocal folds for talking, singing, laughing, crying, screaming, etc. Human voice is specifically that part of human sound production in which the vocal folds (vocal cords) are the primary noise source. Generally speaking, the voice can be subdivided into three parts; the lungs, the vocal folds, and the articulators. The lung must produce adequate airflow to vibrate vocal folds (air is the fuel of the voice). The vocal folds (vocal cords) are the vibrators, neuromuscular units that ‘fine tune’ pitch and tone [1]. The articulators (vocal tract consisting of tongue, palate, cheek, lips, etc.) articulate and filter the sound.

Figure 2.1: Human vocal tract [2]

(24)

!

pharynx and out through the nasal and oral cavities. In English there are four different types of sounds that can be created: aspiration noise, plosion and voicing. Voicing is a quasi periodic vibration of the vocal folds. The frequency of the vibration is called the fundamental frequency of F0 and is perceived as pitch.

A voice frequency or voice band is one of the frequencies which within part of the audio range that is used for the transmission of speech. The voiced speech of a typical adult male will have a fundamental frequency of from 85 to 155 Hz, and that of a typical adult female from 165 to 255 Hz [3]. Thus, the fundamental frequency of most speech falls below the bottom of the "voice frequency" band as defined above.[4]

2.3 Speaker Characteristics

There are a very large number of respects in which speech may differ from different speakers. These can be divided into three main types of speaker identity:

a. Segmental: In linguistics, the term segment may be defined as "any discrete unit that can be identified, either physically or auditorily, in the stream of speech [5]. Segments are called “discrete” because they are separate and individual, such as consonants and vowels and occur in distinct temporal order.

Design And Development Of Voice Transformation.