• Tidak ada hasil yang ditemukan

Voice recognition system for Massey University Smarthouse

N/A
N/A
Protected

Academic year: 2024

Membagikan "Voice recognition system for Massey University Smarthouse"

Copied!
13
0
0

Teks penuh

(1)

Copyright is owned by the Author of the thesis. Permission is given for

a copy to be downloaded by an individual for the purpose of research and

private study only. The thesis may not be reproduced elsewhere without

the permission of the Author.

(2)

Voice Recognition System for Massey University Smarthouse

A thesis presented in partial fulfilment of the requirements for the degree of

Master of Engineering

Ill .

Information Engineering

at

Massey University, Auckland, New Zealand

Rafik Gadalla

2006

(3)

Acknowledgment

T would like to ex press my sin

cere gratitude to my

project

supervisor,

Dr Tom M

oir, fo

r hi

s guidance and support throughout

the duration of th

is project.

I would al

so like to

thank my colleagues Leon Kok

, Grettlc Lomiwes and

Yaitheki Yoganathan

and other members of the smarthouse projects for their effort and

dedica tion t

owards the fo

rmation of the smanhouse.

I'm gratefu

l

L

o my fa

mil

y for all their support and encourage

men t. In particul

ar I would like to

thank my parents for th

eir love, patience and encouragement, my

fiance M

ana!, my

brothers and

fr

iends w hom

w

ith

out their assistance

this proj

ec

t would not have been made poss ible.

A

nd above

all, I would like

to thank God for making i t all happen.

(4)

Abstract

The co ncep t o f a smarthouse aims to integrate techn ol ogy into hou ses to a level w here most daily task s are automated and to provide co mfort, safety and entert ainment to the house res iden ts. The concept is mainly aimed at the elderly populati on to improve their qual ity of life.

In order to maintain a natural medium of communi cation, the house employs a speech recognition system capable of anal ysing spoken language. and extracting commands from it. Thi s projec t focuses on the devel opment and evaluation of a windows appli ca tion devel oped with a high l evel programming language which inc orporates speec h recognition tec hnology by utili sing a commercial speech recognition engine.

The speech recognition system acts as a hu b within the Smarthousc to recei ve and delegate use r command s to different switching and control sys tems.

Initial trail

s were b

uilt using Dragon aturall y Speaking as the recogni tion engine.

However that proved inappropriate for use in the Smanhous e project as it is s peaker dependent and requires each us er to train it w ith hi ·/her own vo ice.

The appli cati on now utilizes the Microso ft Speech Applicati on P rogram ming

Interface (SAPl ), a . oftware l ayer which sits between applications and speech engines and the Microso ft Speech Recognition Engine, which i s freely distributed with some Microsoft products. A

l

though Dragon Naturally Speaking offers better recognition for dictation, MS engine can be optimized using Contex t Free Grammar (CFG) to gi ve enhanced recogniti on in the intended application. The application i s designed to be speaker independent and can handle continuou s speec h. It connects to a database

III

(5)

oriented expert system to carry out full conversations with the users. Audible prompts and confirmations are achieved through speech synthesis using any SAPI compliant text to speech engine.

Other developments focused on designing a telephony system using Microsoft Telephony Application Programming Interface (T API). This allows the house to be remotely controlled from anywhere in the world. House residents will be able to call their house from any part of the world and regardless of their location, the house will be able to respond to and fulfil their commands.

IV

(6)

Ta ble of Contents

Ackn owledg,nent .. ... .. ....... ........... ... ............. .... .. ...... ........ ............. ................

11

A bstract .. ................................ ................... ......................................................... ..... III List of Figures ................................................ ..... .............................................. VII List of A bbreviations .... ............... .................. ................ ............................ ... ........ . VIII Chapter

I:

Introduction ...... ............ ............................ ............ ..................................

I

1.1 Massey Sn1 a rt house ... I

1.1.1 Location and Positioning System ... 3

1.1.2 Voice eparation system ... 3

1.1.3 House Management System ... 4

1.1.4 Remote Switching System ... 4

C hapter 2: Background .. ...... ........ ... .................. ...... .... ..... ...... .............. ... .... 5

2.1

Ho,v S peech Recogniti on Works ... ....................... .................... ............................. 5

2.1.1 Extracting Discriminating Sp<.:cch Features ... 5

2.1.2 Extracting Phonemes ... 7

2.1.3 Applying Grammar and Languagc Models ... 8

2.2

SAPI ..................................... .. ................................................................. ...........

11

2.3

T API ...................................... ........................................ ................. .......................

13 2.3.1 API ... 13 2.3.2 T /\PI server ... 13 2.3.3 Service provider intcrfacc ... 1-1

2.4 HMM ..... ................. ........ ...................... ................................................................ .. 1

5

2.5 XML .............. ....................... .......................................... .......................................... 1

7

2.6 O the r S mart Ho use Proj ects .......... .................................... ................................ l 8 2.7 S peec h Inte rface S tandards ..... ............. .................................................................. ..

19 2.7.1 YoiceXML ... 20

2.7.2 SALT ... 21 2.8

S peec h Recognition in Commercia l En vironments ......................................... ...

23

C hapter 3: Problem Fo rmulation an d System R equirements .. .... .......... .. ......... ... ... 25

3.1 Problem Forn1ulation

... 25

3.2

S pecifications ...

... 27

C hapter 4: System Implem entation .. ..... ... ....... ........ ............. .... .. ........... ....... ......... ... 29

4.1 S peech Interface Design concepts ..................... .................................................. 29

4.2 S peech Recognition Impleme ntation ...

31

4.2.1 Sy lem Initiali,ation ... 31 4.2.2 Command Execution ... 31 4.2.3 Performing other Functions ... 32

4.2.4 Commands Databa e ... 33

4.2.5 Speech Recognition Oow chart ... 36

4.3 Telephony lmplemen tation ......................................................................... ....... 37

4.4 Graphical User Interface .... ................................................................... .......... ...... 40

4.4.l GUI Design Con iderations ... 40

4.4.2 Application Interface ... 42

V

(7)

Chapter 5: Syste,n Testing ... 47

5.1 Preliminary Tests ... 47

5.2 Final Evaluation Methodology ... 48

5.2.1 Room Setup and Recording Equipment... ... 49

5.2.2 Selection and training of Subject Speakers ... 49

5.2.3 Design of command phrases ... 51

5.2.4 Introduction of Noise ... 52

5.2.5 Telephony Testing ... 54

5.2.6 Analysis of Sound files ... 55

Chapter 6: Results and Discussion ... 56

6.1 ASR Engines Feature Comparison ... 56

6.1. I Dragon Naturally Speaking ... 56

6. I .2 Microsoft SAPI Kit ... 56

6.1.3 Vocon 3200 ... 57

6.1.4 IBM via Voice ... 57

6.2 System Evaluation Results ... 58

6.3 Improving Recognition Accuracy ... 64

Chapter 7: Conclusion and Future Work ... 67

7.1 Conclusion ... 67

7 .2 Future Work ... 69

References ... 70

Bibliography ...

73

Appendices ...

75

Appendix A: Matlab Script Used During Testing Process ... 75

Appendix B: Grammar File Used During Testing Process ... 77

VI

(8)

List of Figures

Figure I. I The different components of the Massey U ni versity Smarthouse and how

they integrate together. ... .. ... .. ... .. ... .... ... ... 2

Figure 1. 2: Bluetooth watch worn by Massy Smarthouse occupants ...

...

... 3

Figure 2.1 : Amp

l

itude vs. time graph for the phrase " Massey Uni vers ity" ... 6

Figure 2.2: Spec trograph o f the phrase " Massey U niver sity" ... ... .. ... ... ... 7

Fi gure 2.3 : Structure of a continuous ASR engi ne ... . 10

Fi gure 2.4: SA P! archi tec ture .. ... ...

...

...

...

... ... ... 11

Figure 4. 1: Comm unicati on bet ween the speec h system and expe11 system ... .. .. 30

Fi gure 4.2 : Appli cation's Splash Screen ... ..

...

... ... ... 31

Figure 4.3: Smarth ou e command database schema ... ... ... .. 35

Figure 4.4: Simplified fl ow chart of the speech recogn ition applicati o ...

... 36

Figure4.5: Fl ow chart of tel ephony hand ling application ... .. ... ... 39

Figure 4.6: Standard naming co nventi on used in the app lication's to menu ... .4 I Figure 4.7: L ogical grouping of compo n ent s ....

...

... ... ... .41

Figure 4.8: The application's main fo rm. The diff erent buttons and di al ogs are numbered one to ei ght. ... ... ... .. ...

...

... ...

...

... .42

Fi gure 4.9: Microphone Training Wizard .. ... .. ... .... ... ... .... 44

Fi gure 4. 1 0: Add New Words Wizard .. ... ... .. 44

Fi gure 4. 11 : User Tra ining Wi zard ... ... ... ... . .45

Fi gure 4. 1 2: H el p-About W i ndow ...

...

...

...

.. .46

Fi gure 5.1

: Progr

am used assi

st speakers to read commands ...

...

.

5 1 Fi gure 5.2: Program used for tes tin g and ASR engine accuracy ... ... ... ... 55

Fi gure 6. 1: Recogn it ion accuracy grap h for Speaker

I

(female w ith America n accent) ... ... ... .. ... ...

...

... ... ... .... ..

...

.... .... ...

...

... ... ... 59

Fi gure 6.2: R ecog nition accuracy graph for Speaker 2 (female w ith cw Zealand acc ent) ....

...

.... ...

...

.. ... ....

..

...

...

...

...

... 59

Figure 6.3: Recog nition accuracy graph for Speaker 3 (male w ith fo rei gn accent) .... 60

Figure 6.4: Recogn ition accuracy graph for Speaker 4 (male w ith Scollish accent ) ... 60

Figure 6.5: Recog nition accuracy graph for oise I (Traffic oise) ... ...

... 6

1 Figu re 6.6: R ecogn ition accuracy graph fo r Noi ·e 2 (Crowd o i

s

e) .. ... ....

...

.. 6 1 Figure 6.7: Recogn ition accuracy graph fo r Noise 3 (Children T a lkin g Noi se) ... 62

Figure 6.8: Recog nition accuracy graph fo r oise 4 (Ambient Musi c) ...

..

... ... 62

Figure 6.9: R ecognition accuracy graph for Speaker I using bandpass filtered Speech to simulate telephony quality speec h ... ... ... ..

...

... ... 63

Fig 6. 10: Beamformer microphone array ... 65

VII

(9)

List of Abbreviations

SAPI

CFG

TAPI TCP IP

PCM

LPC

FFT MFCC HMM

ASR TTS API DOI

XML

COM

SPI

D TMF

SALT IVR

RMS SNR

Speech Application Programming Interface Context Free Grammar

Telephony Application Programming Interface Transfer Contro l Protocol

Internet Protocol Pulse Code Modulation Linear Predictive Coding Fast Fourier Transform

Mel Frequency Cepstral Coefficient Hidden Markov Model

Automatic Speech Recognition Text To Speech

Application Programming Interface Device Driver Interface

eXtensib le Mark-up Language

Component Object Model Service Provider Interface Dual Tone Multiple Frequency Speech Application Language Tags Integrated Voice Response

Root Mean Square Signal to Noise Ratio

VIII

(10)

Chapter 1: Introduction

The purpose o f this proj ec t was to deve lo p a vo ice recog niti o n syste m , tha t ca n be used in M assey Uni vers ity Sm artho use to res pond to the occupant s needs a nd des ires s imply by takin g the ir vo ice requ es ts and tra nsforming them into ac ti o ns. The syste m acts as a hub that services and delegates a ll vo ice requ ests to oth er co nt rol sys te ms w ithin the house.

1. 1 Massey Smart house

M assey Uni ve rs it y S marth ouse is a co ll a bo ra ti ve resea rc h and develo pme nt proj ec t amo ng th e In stitut e of In fo rmati o n and Math emati ca l Sc ie nces a nd the In stitu te of Tec hn ology and Eng in ee rin g and o ther indu str y p artn ers. The goal of the p roj ec t is to c rea te a ho use w he re tec hno logy a nd appli ances in th e ho use he lp make li fe eas ie r, safer a nd mo re e njoyable fo r its occ upant s. It res pond s to the needs a nd des ires of occupant s by, fo r exampl e, mo nit orin g the ir hea lth , adj ustin g li ghtin g, te mperatu re, or even am bie nt mu sic to th eir perso nal prefere nces, a nd whe reve r possib le ass is ts, th e m in all the ir dail y tas ks. The Sm artho use m ain aims are:

• M onit or the hea lth and sa fe ty of its occupa nts, b y us in g the la tes t in in fo rm ati o n sys te ms and bi otec hno logy.

• Auto mate commo n house managem ent tasks, thu s a ll owin g inhabit ants to have a m ore e nj oyabl e and com fo rtabl e li fe.

• Pro vide info rm ati on and e nte rtainme nt to th e occ upant s up on the ir de m a nd.

It sho uld hide the tec hnique and details of how it works a nd be comple tel y intuiti ve to

use (Human Ce ntred Des ign).

(11)

The main beneficiaries of this project will be the elderly population who want to retain their independence, and their families and fri ends who can be secure in the knowledge that they are safe, well and comfortable. The health sector will benefit by being able to more effectively help and monitor people in their care. There will also be a number of other benefits for the construction industry, appliance indus try, and for other people who wish to improve their quality of life.

The Massey University Smarthouse [ 1] will be a world-class showcase for the integration of hou se automation, health care a nd s mart appliance technology. Figure 1.1 provides an overview of the different components of the Smarthouse that are discussed below

Voice Separation Component

The Internet (Weather Information, TV listinqs, etc)

The Voice Recognition Component

...

Artificial Intelligence Component

Location System Component

Remote Switching Component

Figure 1.1 The different component" of the Ma-.sey University Smarthou"c and how they integrate together

2

(12)

1.1.1 Location and Positioning System

T rackin g the pos iti on of occ upant

s and dev

ices w ithin the ho use is essentia l to att a in

s

mart co ntrol a nd mo nitorin g. The s marth o use the refore w ill be equipped with a Blu etooth ubiquit ous netwo rk that co nsists of tra nsceive r nodes that span across the roof of the entire house.

The occup

ants of the house will

be wearin

g a Bluetoot h trans mitting watch

that

co nta ins th

eir unique

ly id

entifiable code

that lets th

e

house know who they are, and

exac

tl y where they are within the house.

Figure 1.2: Bluct()oth \\illch \\\lrn h) :\la"">' Smartlmu"c \lccupanh

1.1 .2 Voice separation system

To enable the sm artho use to be

controlled by vo

ice, tw o

approaches ca

n be taken. The

first

is for all occup ants to wear a vo ice capturin g device, in the fo rm of headse t o r

watch or o

the r. The second is to u e wa ll

or roof mo

unte d microp ho nes to all ow for di s tant s peec h recog niti o n. Beca use the first approach is restricti ve to the occ upants, M assey

University's

Smarthouse will be usin g beamforme r mic ro pho ne array s. The development of the bea m fo rmer arrays utili se some

well known bea

mfo rmin g al gorithms to minimize no ise,

and pro

vide clean, hig h qualit y speech for the peech

3

(13)

recognition system. The main algorithm used will he a modified version of the Griffiths-Jim beamformcr.

1 .1.3 House Management System

The house management system is a PC based software that contains all the rules that govern the operation of the house.

It

will act as the central control unit that will be communicating all the necessary inrormation to and from other components within the house. The application will be equipped with an expert system implemented in the form of a datahase. The system will collect information from the location system, the speech recognition system and the different sensors within the house to manage the daily operations of the house in an intelligent manner.

1.1.4 Remote Switching System

Switching and control of appliances is made possible by a TCP/IP switching system built using an embedded system that, although capable of being used as a single and stand-alone device

to

aid in home-automation, also integrates into the smarthouse environment, allowing a number of smart appliances

to

he networked and controlled by the house management system. The device offers a simple web browser interface to show the status of connected devices at any given moment.

4

Referensi

Dokumen terkait