• Tidak ada hasil yang ditemukan

/(L ,X) - ePrints@IISc

N/A
N/A
Protected

Academic year: 2024

Membagikan "/(L ,X) - ePrints@IISc"

Copied!
4
0
0

Teks penuh

(1)

PFORMANCE EVALUATION OF AUTOMATIC &PEAKER RECOGNITION SCHEMES

ABSTRACT

I), Venugopal and

v,V,S,

Sarrna

Indian

Institute of Science Bangalore 560012

A

mathematical formulation of an auto- matic speaker verification scheme as a two

class pattern recognition problem is presen- ted. Expressions for

the expe cted values

and

the

variance of

the design—set and the test set

error rates are

derived,

The bound on the

performance of an automatIc speaker id- entification system as a cascade of indepe- ndent verification systems is derived,

The

implications

of

these results in the design

of az automatic sp eaker recognition, system

are discussed.

I INTRODUCTION

The problem of performance evaluation of

any automatic

speaker verification sys-

tem

(ASVS)

Is yet to be satisfactorily SQl- ved.

In general pattern recognition lite- rature, the performance estimation has re- ceived

considerable notice recenty and the importance of these results in the design

of an

AEVS

is

discussed in

a recent note of the

authors'.

In

this paper, an

ASVS

is analysed as

a

2—class pattern recognition problem to bring

out

explicitly the effect of the va- rious

parameters on the performance esti-

mation following the mathematical model of

a patern verification system provided by

Dixon •

The

expected error rates for both the design-set and the test-set are derived

as a function of the number of samples per speaker (N), the number of features

(L)

,the

number

of customers (M), the

number

of de- sign impostors

(K)

and the Mahalanobis dis-

tance ()

between

the classes

under Gaussi-

an assumptions, The variance of the design-

st

error rate is derived bringing

out the importance of choosing sufficient

number

of Impostors

at the system design stage, The expected error rates for an automatic spea- ker

identification system (ASI) as a cas- cade of

N

independent

ASVS

are derived •

The

importance of these resalts in the design of an automatic speaker recognition system is pointed out.

II

MATHEMATICAL NODEL FOR ASVS

In an ALVS,

a

given speaker not nece- ssarily belonging to the system,

utters a

780

predetermined phrase,

He

also presents a label claiming that he is a particular

ttcustomeru belonging to the system. In the system, a predetermined set of features, possibly depending on the label entered,

is extracted from the utterance and the sp- eaker is either accepted or rejected.

Let

the ASVS

be designed for a set of

M known customers S il,...,MJ

and

a single alien group,

S0

of all

unknown

speakers (rest of the world) This mom-

ion of an alien group of speakers

dis-

tinguishes an

ASVS

from an

ASIS

and is es- sential as there is always a chance of a person not belonging to the set S trying to impersonate as one of S. Corresponding

to each

member

of S, there is a label L.(j=

If any speaker wants to

be vei-

f

as d under the

label L1, then we define two

classes: the C. ciss consisting of the speaker

S• and theimposter class C' con-

sisting the remaining

(N—i)

speaers and

the alien group S.

Cj={s;cj=so,s, i=1,,,•,M, j=l, . .

The

system on the

basis of the feature ve-

ctor X of

dimension L will

assign

the sp-

eakers to

C

or Cj. The accept/reject rule using the optimal Bayesian classifier is to

to accept the claim of

a

particular speak- er as valid if

P(C/L ,X)

>

P(C/L

,x) (1)

and

reject

it otherwise,

The aposteriori probabilities are gi-

ven

P(X/L,Cj)P(L/C)P(C) /P(L,X) p

P(Cj/L ,X) = p(X/Lj /(L (2b)

,X)

where

the probabilities have the usual me- anings

and

p(L./C.) is the probability of

speaker

b waa.jn to

be verified under

his

on

libel L. and p(L/C.) is the prob- ability of an ".mpostor"

trng

to present

label

L., While the actual values of the

variousprobabilities of (2a) and (2b) de-

pend upon the conditions in a particular

environment,

we may

assume in most cases

(2)

and

p(

)'>p(C1), assum- ing3equal

a piori probabdities for

all sp-

eakers1

A

reasonable assumption considering

the

above inequalities is (L./C.)p(Cj)=

p(L/C1)p(C). Again, it may

e

ostulated that

ifi

the presence of the knowledge of

class Ci the feature vector is independent of

the label L,

p(X/Lj,Cj)=p(X/Cj). The de-

cision

rule (l can

now

be rewritten as

£ccept

if p(X/L3,C3)>p(X/L3,C3) (3) and reject otherwise,

The class-conditional densities in (3)

are

given by

pc/C3)=pç/S3)

= l,...,M (4)

N

pOc/s1)p(S) i=O,ij

where

p(X/5j), i=O,l,,,.,M

are speaker-con- ditional densities of the feature vector X,

If the distributions p(X/$1),i=O,..M are

completely

known, then

the error rate

can

be

calculated exactly. Otherwise pj)

i=l,...,M are

to

be estimated from

N

label- led samples of each speaker of sot 8 where as p(X/80) is to be estimated from a set of

"impostor

references" (of speakers of Just as the number of training samples per

class is finite, the number of impostors(I that can be

considered to represent the gr-

oup S

is also finite.

Nature of the Class-Conditional Densities:

Assumption: p(X/C3)'N(3,E) and

p(X/C) N(3,

where

E3=E,E3=and

p=M+K- are

the known

covariance matrices of the

two

cl- asses C and C and

the means f.h. and h3 are to

be estirnate. from

N

design simples of_

class C and

pN design

samples of class C.

Remark:

It may not

be unreasonable to ass-

ume p(X/S-)=p(X/C1) to be Gaussian. Then

p/) i a

finite mixture of Gaussians, This

will

be

a

multimodal distribution for

small

N

and K. Again it may not

be unrea-

sonable, for large

N

and K, to fit

a

Gauss-

ian distribution to samples of class C3.

Classifier 2

For

notational convenience,we denote C3 by Cl and C3 by C2 and p(X/C1)r"

N(1,E) and p(X/C2)-" N'2,pE). For

this ca- se of unequal

means

and

covariance

matrices the

minimum probability o± error classifier is

the one using quaciratic diecriminaxtt fun-

ction. In this

paper, the minimax linear dis- criminant with

eual

error rate is

used for further

analysis. We define

a

linear die

cr

iminant

' —l *

d

=

) (5)

where = t+(l-t)

where t

1

is

a

parame-

ter

of our choice and =(l/N) E31

and

u

=

(l/PN)E!i X2. where X1. is the

jth labelled sample of 3class i. or equal

error

rate the threshold

9

is given by O=d'

(B1"2

•I-

h)/(l/2+l)

(6)

Therefore,

a

sample

X

is assigned to class

781

C1 if

dtXe,

i.e.

O

and to class C2, otherwise, (7) III,

TEI

AND DESIGN-SE7 ERROR RATES FOR

AN 81TS

The ASVS may be tested either by new utterances of the speakers belonging to S

and

S (test set) or by the sample utter-

ances used Jor designing the system (desi-

gn set)0

Test-set error rate: The

expected

test—set

error

rate

()

may be

written from (7) as

T=pr 4! (E)

X-

I

where

X is the feature vector correspond- (8) ing to an

arbitrary new utterance from cl-

ass C1.

Proposition

1 2 CT in (8) may be expressed as

the prbability of the ratio of two non-

central variates and

'2

being greater than the quantity (l-l)/(l+fl).

(l-P1)/(l+P1)] (9)

where

and °2 are distributed as

and

2(L,2).

2

X1= [2(l+P1)]

pNjq3+l)- [1+2+p+l)2NiJ 2

x= [2(1- F1)jN

(

+lYi-[l+ 2

(p+i)

2N]}

z2

= Mahalanobis squared distance between the

two

(3/2..j) populations = (p+l) (l+2+(p÷l) (h)

t

2N)

Proof : (The proof follows that of Moron4.

We

define

two

random vectors u and v such

that

u=

(pN/p+l)' (E') (h-)

and

2

'

i+p +(+l)

N)T

2u1+u2

where

Then

T can be written as

T=pr(u'v O)=prj(u+v)' (u+v)-(u-v)' (u-v)O1.

(u+v)

and (u-v) are die tributed independe-

ntly with dispersion matrices 2(l+Pl)IT and

2(l-P1)I, where P1

is the corre1aion coefficient

etween corresponding pair of elements of

u

and

v

and is the LXL unit matrix0

Thus if,

w1—(-) (u+v)1

(u=v)/(l+Pi)

and m94(u_v)t (uv)/(1—P), eqn.(9) follOws.

Detailed proof is given in reference 5.

Proposition 2 : It is also possible to ex- press the expeced error rate

CT

in closed form expression

T

=

Q(L,X1,

2' where (10)

(3)

0+02

P1)1-c(e1,o2)-exp(- 2

m=l4L12 'm (eie2) { B;1p)(jL+in,

jLm)5m03. (11)

where 0=(l+ e2(1- 1)c(e,)

is

the circular coverage function, I (z) the modified Bessel function of

fir&

kind

B(p,q) the incosiplote p-function. The up- per

sign is for

m

0

and

the

lower sign for m<0,

Design—set error rate : The expected de-

sign set error rate may be written from eqn,(7) as

— pr[(-) (E')X1—

+1

ol (12)

where X. is the feature vector correspond- ing to

M arbitrary

utterance from the de-

sign set of class C1,

Proposition 3: in (12) may be expressed as the proability of the ratio of two non-

central

' variates m and w being grea-

ter than the quantity

=pr[o3/w4

(l-?2)/(l+F2)7 (13) where

w

and are distributed as %2(L,213) and

x342(l+F2)]8s+l)

+

(+1)2NJ E2

= [2(1- r2) p+1) -3/2(+2)

+

2 2 2

Proof

: The proof is similar to that of proposition 1,

Proposition 4; It is

also possible to ex-

press

sed form

the expected

expression

rror rate in a

clo-

= Q(L,)3,

X4,

F),

where (14)

) is

the same function as de- fined in

(lL

with K. and

k

and being

replaced by )4

an

F2

repectivly,

Proposition 5: The variance

_2

of

the ran-

dom

variable

eD

is given

by

=

(i-c )/(p+l)N (is)

The

proof follows that of Foley and

is gi-

ven in reference 5,

-. In Fig. 1 and

2

the values of

and

are

plotted as functions of N/i

and B.

Fig,

1 gives the nature of the biases that

creep into the eatimates for small

N/L.The

test—set error rate is pessimistically

bi-

ased and the

design—set error rate is opti

mistically biased, Fig,

2 shows

that for an SV$ the expected error rates

become

in-

dependent of population size for large

po-

pulation,

It may

be seen from (15) that the

variance of the design-set error rate is inversely proportional to the number of recorded sample utterances for speaker and

the total number of speakers including the design impostors, Fig, 2 and (15) show that for

small

number of customers (M if suff i-

cienily

large

number

of_design_impostors

(K)

are not used, both

and

are

opti-

mistically

biased and

ar not reliable,

However, for large N,

a

large

K

is not essential,

\rEsTEr

-\

-Z TLT DE5J

SET

S

Fig,2-Expected Error Rates as Functions of Population Size

0-5 -

0-4

cc cc

O

0

Cii

Ui 0.2

C— U

Ui 0

Ui

oc

TEST SET

0

2 3

4 5 6

7

6

EJ

IL

Fig,l-Expected

Error Rates as Function of N/L

UI

0 a:

Cii

Ui

C—

'C

Lu

lL4

AC-J

4 r

I0 20 30 40 50 60 70 80

0

782

(4)

IV PERFORM1NCE EVALUATION OF

IS

s

ASIS can

be

reali2:ed as

a

cascade

of (N-i) or

N

ASVS's as shown

in Fig.3All

the A$VSs are

assumed

to have identifical performance. If there is

a

reject option at Mth stage also the possibility of (14+1)

classes corresponding to

N

customers and

an alien class (as not belonging to the system) can be introduced in an

ASIE

as

well,

On

the other hand, the decision can

be

terminated at (M-l)th stage and spea-

ker

814 can be

accepted.

Let p be the probability of error

and

q the probability of

correct

decision of

an

ABVS, Assuming

that the jth speaker has te-

st,ed the system,

we

can draw the decision tree

as shown in Fig•4, If D1, i l,.,.,M is the decision taken by the system at the ith stage that the speaker is

S,

then we

can

write

the probability of correct deci-

sion

s N

=

E P(5.,D.) = E P(D./S.)P(S.) (16)

j=

.3 .3

Assuming

equal apriori probabilities

for all speakers belonging to

S

and from

Fig.

4,

we

can write

=

(j/) ( j

=1

q) (,/) (

qM

/ (1-

q)

(17)

Equation

(19) shows the effect of popula-

tion si7e on the performance of an ASIS,, thus corroborating

Dodding

ton' s

results

'.

V DISCUSSION OF RESULTS

The

design of an

ASVS

proceeds in

three steps: (i) Data

base preparation,

(ii)

feature selection and

extraction

and

(iii)

statistical classification and

per-

formance evaluation. All

the

stages are,of course,

interrelated.

The

results of sect- ion III provide information on: (i)Prepa- ration of data set

(number

of design

sam-

pie utterances per speaker (N)),(ii)

The

dimension

of

the

feature vector (L). If

niL

ratio is small there will be wide dis- parities in performance estimates that

wifl.

be obtained if

the

system is tested on the design

set or on an independent test set, (iii) The discriminating ability of

a

fea- ture depends on the appropriate distance between

the classes concerned. If the un-

derlying distributions are Gaussian the distance between classes itself provides an estimate

of

error, It should

be kept in mind,

however, that the distance estimate

from

a finite

number

of samples per

class

is a biased estimate of the true distance between the populations, (iv) The error rates

and

are functions of p

(Fig.2), Even

if an

ASV

is to be designed for a

small number of customers N, a sufficiently large number

K

of

impostors should be con-

sidered in the

design set so as to make the performance

estimates reliable. For large

N

this

may

not be so important. The design

of a verification system is thus equally

complex for small or large

N

because of the presence of the alien class,

REFERENCES

1,

V.1T.S, Sarma and

D,

Venugopal,

"Performance evaluation of automatic speaker verification systems", IEEE

Trans,Acoustics, Speech, Signal Processing (to be published).

2

R,C.Dixon and P,E. Bourdeau, 'tMathe

matical model

for pattern verification",

IBM J. of

R

and D,

Vol, 13,

pp

717-721, Nov.

1969.

3. T,W,Anderson and R.R.Bahadur,

"Classi- fication into

two multivariate normal distributions with different covariance matricest', Ann, of Math, Statist,, Vol.33

pp

420-431, June 1962.

4. M,A.Moron, "On the expectation of erro ro of allocation associated with

a

linear diseriminant function", Biome trika, Vol.

62, pp 141-148,

Apr.

1975,

5. V,V.S,Sarma and B, Venugopal, "Statis- tical problems in performance assessment of Automatic Speaker Recognition Systems"

CI? Report No,61, Dept, of BCE, Indian Inst0 of Science, India, Jan. 197?, 6, D.H, Foley, "Considerations of sample and feature size", IEEE Trans,Information Theory, Vol.17-18, pp 618-626,Sept,1972.

7, A,E,Rosenberg, "Automatic speaker yen—

fication:

a

review", Proceedings IEEE, Vol.64, pp 475—487, Apr. 1976,

—--i

r1

'r

I

YE

DECISJOM

'

)40

0

U- Vt >

-1 cI

tnI

kio 4o

u-I

>1

w

YE5

Fig,3 - ASIS as

a

Cascade of ASVS

Dj

Fig,4 —

Decision

Tree for Speaker j

783

Referensi

Dokumen terkait

T1:Piecewise-linear auditory filters, TZGaussian auditory filters, T3:DESA- l,T4DESA-2,T5:PTFD, T6Hilben transform absence of noise, the effect of parameters such as carrier frequency

共21兲, determine the force-extension behavior of the random A – B block copolymer as a function of the following parameters: 共i兲 the number of prepolymers n in the chain共which is kept

We shall consider three different strategies for us- ing Pτ to probe the SUSY model and determine the model parameters: A discrimination between SUSY models, B partial determination of

Based on a priori domain knowledge of the data as well as through preliminary anal- ysis on the training data, the parameters of a minimum support, for frequent item generation, b block

THATHACHAR Absfracf-New criteria in the multiplier form are presented for the input-output stability in the &-space of a linear system with a time-varying element k t in a

Further, PbASKO sporozoites obtained with and without the supplementation of asparagine in mosquitoes showed a similar pattern of liver-stage development PbWT + Asnase PbWT + Asnase

Second, we estimated the male and female genomic coverage of each scaffold to identify X-linked and autosomal genes in the species for which no linkage maps were available: In case of a

In the absence of any mathematical model describing a relationship between the experimental parameters gas pressure, rod diameter and gap spacing and the corresponding breakdown