• Tidak ada hasil yang ditemukan

13 1 ADPCM in Speech Coding

N/A
N/A
Protected

Academic year: 2019

Membagikan "13 1 ADPCM in Speech Coding"

Copied!
36
0
0

Teks penuh

(1)

Chapter 13 Basic

Chapter 13 Basic

Audio

Audio

Compression

Compression

Techniques

Techniques

13 1 ADPCM in Speech Coding

13.1 ADPCM in Speech Coding

13.2 G.726 ADPCM

13.3 Vocoders

(2)

13 1 ADPCM in Speech Coding

13 1 ADPCM in Speech Coding

13.1 ADPCM in Speech Coding

13.1 ADPCM in Speech Coding

y

ADPCM

forms the heart of the ITU's

y

ADPCM

forms the heart of the ITU s

speech compression standards

G.721,

G.723, G.726, and G.727.

y

The di erence between these standards

involves the

bit-rate

(from 3 to 5 bits per

sample) and

some algorithm details

sample) and

some algorithm details.

(3)
(4)

13 2 G 726 ADPCM

13 2 G 726 ADPCM

13.2 G.726 ADPCM

13.2 G.726 ADPCM

y

ITU G 726 supersedes ITU standards

y

ITU G.726 supersedes ITU standards

G.721 and G.723.

y

Rationale: works by

adapting a

xed

quantizer

in a simple way. The di erent

q

p

y

(5)

y

The standard de

nes a

multiplier constant

y

The standard de

nes a

multiplier constant

(6)

y

The input value is a

ratio

of

a di erence

ith th

f t

α

with the

factor

α

.

(7)

Backward Adaptive Quantizer

Backward Adaptive Quantizer

Backward Adaptive Quantizer

Backward Adaptive Quantizer

y Backward adaptive works in principle by noticing Backward adaptive works in principle by noticing

either of the cases:

◦ too many values are quantized to values far from ld h if i i i f

zero – would happen if quantizer step size in f were

too small.

◦ too many values fall close to zero too much of the y time

x would happen if the quantizer step size were too large.

y Jayant quantizer allows one to adapt a backward y Jayant quantizer allows one to adapt a backward

quantizer step size after receiving just one single output.

◦ Jayant quantizer simply expands the step size if the quantized input is in the outer levels of the quantizer, and reduces the step size if the input is near zero.

(8)

The Step Size of Jayant Quantizer

The Step Size of Jayant Quantizer

The Step Size of Jayant Quantizer

The Step Size of Jayant Quantizer

y Jayant quantizer assigns Jayant quantizer assigns multiplier values Mmultiplier values Mkk to to

each level, with values smaller than unity for levels

near zero, and values larger than 1 for the outer le els

levels.

y For signal fn, the quantizer step size ∆ is changed

according to the quantized value k, for the according to the quantized value k, for the

previous signal value fn−1, by the simple formula

y Since it is the quantized version of the signal that

(9)

G.726

G.726 —

— Backward Adaptive Jayant

Backward Adaptive Jayant

Quantizer

Quantizer

y G.726G.726 uses uses xed quantizer steps based on the xed quantizer steps based on the

logarithm of the input difference signal, en divided by

(10)

y The "unlocked" part adapts via the equationp p q

y The locked part is slightly modip g y fied from the unlocked partp :

y The G.726 predictor is quite complicated: it uses a linear p q p

combination of 6 quantized differences and two

(11)

13 3 Vocoders

13 3 Vocoders

13.3 Vocoders

13.3 Vocoders

y Vocoders – voice coders which cannot be y Vocoders voice coders, which cannot be

usefully applied when other analog signals, such as modem signals, are in use. g

◦ concerned with modeling speech so that the salient features are captured in as few bits as possible.

◦ use either a model of the speech waveform in time (LPC (Linear Predictive Coding) vocoding), or ... → ◦ break down the signal into frequency components

◦ break down the signal into frequency components

and model these (channel vocoders and formant vocoders).

y Vocoder simulation of the voice is not very good

(12)

Phase Insensitivity

Phase Insensitivity

Phase Insensitivity

Phase Insensitivity

y A complete reconstituting of speech waveform is A complete reconstituting of speech waveform is

really unnecessary, perceptually: all that is needed is for the amount of energy at any time to be

ab t ri ht and the si nal ill s nd ab t ri ht about right, and the signal will sound about right.

y Phase is a shift in the time argument inside a

function of time. function of time.

◦ Suppose we strike a piano key, and generate a roughly sinusoidal sound cos(ωt), with ω =2πf.

◦ Now if we wait sufficient time to generate a phase shift π/2and then strike another key, with sound

cos(2( ωt + π/2), we generate a waveform like the solid ) g line in Fig. 13.3.

(13)

y

If we did not wait before striking the

y

– If we did not wait before striking the

second note, then our waveform would

be cos(

ω

t)+cos(2

ω

t). But perceptually,

the two notes would sound the same

sound, even though in actuality they

would be shifted in phase.

(14)

Channel Vocoder

Channel Vocoder

Channel Vocoder

Channel Vocoder

y Vocoders can operate at Vocoders can operate at low bit-rateslow bit rates, , 1–2 kbps1 2 kbps. To . To

(15)

y Due to Due to Phase Insensitivity Phase Insensitivity (i.e. (i.e. only the energy is only the energy is important):

◦ The waveform is "rectied" to its absolute value.

Th l b k d i l i l l f h

◦ The lter bank derives relative power levels for each frequency range.

◦ A subband coder would not rectify the signal, and would

id f b d

use wider frequency bands.

y A channel vocoder also analyzes the signal to

determine the general pitch g p of the speech (p (low — bass, or high — tenor), and also the excitation of the speech.

y A channel vocoder applies a vocal tract transfer y A channel vocoder applies a vocal tract transfer

(16)

Formant

Formant Vocoder

Vocoder

Formant

Formant Vocoder

Vocoder

y FormantsFormants: the : the salient frequency salient frequency components that are components that are

present in a sample of speech, as shown in Fig 13.5.

y Rationale: encoding only the most important frequencies

(17)

Linear Predictive Coding (LPC)

Linear Predictive Coding (LPC)

Linear Predictive Coding (LPC)

Linear Predictive Coding (LPC)

y LPC vocodersLPC vocoders extract salient features extract salient features of speech of speech

directly from the waveform, rather than

transforming the signal to the frequency domain

LPC F

y LPC Features:

◦ uses a time-varying model of vocal tract sound generated from a given excitation

generated from a given excitation

◦ transmits only a set of parameters modeling the

shape and excitation of the vocal tract, not actual

i l diff ll bit t

signals or differences small bit-rate

y About "Linear": The speech signal generated by

the output vocal tract model is calculated as a the output vocal tract model is calculated as a function of the current speech output plus a

(18)

LPC Coding Process

LPC Coding Process

LPC Coding Process

LPC Coding Process

y

LPC

starts

by deciding whether the current

y

LPC

starts

by deciding whether the current

segment is

voiced or unvoiced:

◦ For unvoiced: a wide-band noise generator g is

used to create sample values f(n) that act as input to the vocal tract simulator

◦ For voiced: a pulse train generator creates values

◦ For voiced: a pulse train generator creates values

f(n)

◦ Model parameters ap ii: calculated by using a y g

least-squares set of equations that minimize the

di erence between the actual speech and the speech generated by the vocal tract model speech generated by the vocal tract model, excited by the noise or pulse train generators

(19)

LPC Coding Process (cont'd)

LPC Coding Process (cont'd)

LPC Coding Process (cont d)

LPC Coding Process (cont d)

y

If the output values generate s(n) for

y

If the output values generate s(n), for

input values f(n), the output depends on

p

previous output sample values

:

LP

b

l l

d b

(20)

LPC Coding Process (cont'd)

LPC Coding Process (cont'd)

LPC Coding Process (cont d)

LPC Coding Process (cont d)

y

By taking the

derivative of a

and

setting

y

By taking the

derivative of a

i

and

setting

it to zero

, we get a set of J equations:

(21)

LPC Coding Process (cont'd)

LPC Coding Process (cont'd)

LPC Coding Process (cont d)

LPC Coding Process (cont d)

y

An often used method to

calculate LP

y

An often-used method to

calculate LP

(22)

LPC Coding Process (cont'd)

LPC Coding Process (cont'd)

LPC Coding Process (cont d)

LPC Coding Process (cont d)

y

Since

φ

(i j) can be de

ned as

φ

(i j)= R(|i

y

Since

φ

(i, j) can be de

ned as

φ

(i, j) R(|i

j|), and when R(0)

0, the matrix {

φ

(i, j)} is

positive symmetric, there exists a fast

p

y

(23)

LPC Coding Process (cont'd)

LPC Coding Process (cont'd)

LPC Coding Process (cont d)

LPC Coding Process (cont d)

y

Gain G

can be calculated:

y

Gain G

can be calculated:

(24)

Code Excited Linear Prediction

Code Excited Linear Prediction

(CELP)

(CELP)

y CELP is a more complex family of coders that CELP is a more complex family of coders that

attempts to mitigate the lack of quality of the simple LPC model

CELP l d f h

y CELP uses a more complex description of the

excitation:

◦ An entire set (a codebook) of excitation vectors is

◦ An entire set (a codebook) of excitation vectors is matched to the actual speech, and the index of the best match is sent to the receiver

Th l i i h bi 4 800 9 600

◦ The complexity increases the bit-rate to 4,800-9,600 bps

◦ The resulting speech is perceived as being g p p g more similar and continuous

(25)

The Predictors for CELP

The Predictors for CELP

The Predictors for CELP

The Predictors for CELP

y

CELP coders contain two kinds of prediction:

y

CELP coders contain two kinds of prediction:

◦ LTP (Long time prediction): try to reduce

d d i h i l b di th b i

redundancy in speech signals by nding the basic periodicity or pitch that causes a waveform that more or less repeats

more or less repeats

◦ STP (Short Time Prediction): try to eliminate the

(26)

Relationship between STP and LTP

Relationship between STP and LTP

Relationship between STP and LTP

Relationship between STP and LTP

y

STP

captures the

formant structure

of the

y

STP

captures the

formant structure

of the

short-term speech spectrum based on

only a

few samples

p

y

LTP,

following STP, recovers the long-term

correlation in the speech signal that

h

d

h

represents the periodicity in speech using

whole frames or subframes (1/4 of a frame)

LTP i ft

i

l

t d " d ti

– LTP is often implemented as "adaptive

codebook searching"

y

Fig 13 6 shows the relationship between

STP

y

Fig. 13.6 shows the relationship between

STP

(27)
(28)

Adaptive Codebook Searching

Adaptive Codebook Searching

Adaptive Codebook Searching

Adaptive Codebook Searching

y

Rationale:

y

Rationale:

Look in a codebook of waveforms to

nd one

that matches the current subframe

Codeword: a shifted speech residue segment

indexed by the by the lag

τ

corresponding to

the current speech frame or subframe in the

adaptive codebook

(29)

Open

Open Loop Codeword Searching

Loop Codeword Searching

Open

Open--Loop Codeword Searching

Loop Codeword Searching

y

Tries to minimize the

long term

y

Tries to minimize the

long-term

(30)

LZW Close

LZW Close--Loop Codeword

Loop Codeword

Searching

Searching

y Closed-loop search is more often used in CELP y Closed-loop search is more often used in CELP

coders — also called Analysis-By-Synthesis (A-B-S)

y speech is reconstructed and perceptual error for speech is reconstructed and perceptual error for

that is minimized via an adaptive codebook search, rather than simply considering sum-of-squares

y The best candidate in the adaptive codebook is

selected to minimize the distortion of locally reconstructed speech

y Parameters are found by minimizing a measure of

h di b h i i l d h

(31)

Hybrid Excitation Vocoder

Hybrid Excitation Vocoder

Hybrid Excitation Vocoder

Hybrid Excitation Vocoder

y Hybrid Excitation Vocoders are di erent from y Hybrid Excitation Vocoders are di erent from

CELP in that they use model-based methods to introduce multi-model excitation

y includes two major types:

◦ MBE (Multi-Band Excitation): ( ) a blockwise codec, in which a speech analysis is carried out in a speech frame unit of about 20 msec to 30 msec

MELP (M ltib d E it ti Li P di ti ) h

◦ MELP (Multiband Excitation Linear Predictive) speech codec is a new US Federal standard to replace the old LPC-10 (FS1015) standard with the application focus ( ) pp on very low bit rate safety communications

◦ a DoD speech coding standard used mainly in military applications and satellite communications

(32)

MELP Vocoder

MELP Vocoder

MELP Vocoder

MELP Vocoder

y

MBE utilizes the A-B-S scheme in parameter

y

MBE utilizes the A B S scheme in parameter

estimation:

◦ The parameters such as basic frequency, p q y,

spectrum envelope, and sub-band U/V decisions are all done via closed-loop searching

◦ The criterion of the closed loop optimization is

◦ The criterion of the closed-loop optimization is based on minimizing the perceptually weighted reconstructed speech error, which can be

(33)

MBE Vocoder

MBE Vocoder

MBE Vocoder

MBE Vocoder

y MELP: also based on LPC analysis uses a y MELP: also based on LPC analysis, uses a

multiband soft-decision model for the excitation signal g

y The LP residue is bandpassed and a voicing

strength parameter is estimated for each band

y Speech can be then reconstructed by passing the

excitation through the LPC synthesis lter

y Di erently from MBE, MELP divides the

excitation into ve xed bands of 0-500,

(34)

MELP

MELP Vocoder

Vocoder (Cont'd)

(Cont'd)

MELP

MELP Vocoder

Vocoder (Cont d)

(Cont d)

y A voice degree parameter is estimated in each y A voice degree parameter is estimated in each

band based on the normalized correlation

function of the speech signal and the smoothed p g rectied signal in the non-DC band

y Let sk(n) denote the speech signal in band k, uk(n)

(35)

MELP Vocoder (Cont'd)

MELP Vocoder (Cont'd)

MELP Vocoder (Cont d)

MELP Vocoder (Cont d)

y

MELP adopts a jittery voiced state to

y

MELP adopts a jittery voiced state to

simulate the marginal voiced speech

segments – indicated by an aperiodic

g

y

p

ag

g

y

The jittery state is determined by the

peakiness of the full-wave recti

ed LP

d

( )

residue e(n):

y

If peakiness is greater than some threshold,

(36)

13 4 Further Exploration

13 4 Further Exploration

13.4 Further Exploration

13.4 Further Exploration

→ Link to Further Exploration for Chapter 13. Link to Further Exploration for Chapter 13.

y A comprehensive introduction to speech coding may

f S '

be found in Spanias's excellent article in the textbook references.

y Sun Microsystems Inc has made available the code y Sun Microsystems, Inc. has made available the code

for its implementation on standards G.711, G.721, and G.723, in C. The code can be found from the link in the Chapter 13 le in the Further Exploration

in the Chapter 13 le in the Further Exploration section of the text website

y The ITU sells standards; it is linked to from the text

website for this Chapter

Referensi

Dokumen terkait

Evaluasi yang dilakukan dalam penyuluhan gizi ini dengan memberikan pre-testyang dilakukan pada pengambilan data awal berupa karakteristik contoh dan profil sekolah

Berdasarkan hasil observasi penulis juga menanyakan kepada sebagian masyarakat Desa Karangbendo khususnya yang sudah berkeluarga, penulis menemukan

Dalam penelitian yang berjudul “Analisis Pengaruh Usia, Pendidikan, Pendapatan Dan Kelas Sosial Terhadap Perilaku Konsumen Dalam Pembelian Sepeda Yamaha Pada Mataram Sakti

Pasal 5 UUPA: “Hukum Agraria yang berlaku atas bumi, air, dan ruang angkasa adalah hukum adat, sepanjang tidak bertentangan dengan kepentingan nasional dan negara, yang

PENERAPAN MODEL QUANTUM LEARNING DALAM UPAYA MENINGKATKAN MINAT DAN PRESTASI BELAJAR MATEMATIKA..

Tujuan penelitian ini adalah untuk menguji pengaruh Dana Alokasi Umum (DAU, Pendapatan Asli Daerah (PAD) dan belanja modal terhadap pendapatan per kapita.

Hasil Koreksi Aritmatik Perbandingan Harga Penawaran. Terkoreksi Terhadap HPS

Banyak orang menganggap perkawinan dini akan menambah tingkat perceraian karena kegagalan dalam rumah tangga, disebabkan oleh cara fikir pada pasangan suami-isteri yang