• Tidak ada hasil yang ditemukan

Audio Inpainting

N/A
N/A
Protected

Academic year: 2021

Membagikan "Audio Inpainting"

Copied!
28
0
0

Teks penuh

(1)

Amir Adler, Valentin Emiya, Maria Jafari, Michael Elad, R´

emi Gribonval,

Mark D. Plumbley

To cite this version:

Amir Adler, Valentin Emiya, Maria Jafari, Michael Elad, R´

emi Gribonval, et al.. Audio

In-painting. IEEE Transactions on Audio, Speech and Language Processing, Institute of Electrical

and Electronics Engineers, 2012, 20 (3), pp.922 - 932.

<10.1109/TASL.2011.2168211>.

<inria-00577079>

HAL Id: inria-00577079

https://hal.inria.fr/inria-00577079

Submitted on 16 Mar 2011

HAL

is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not.

The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire

HAL

, est

destin´

ee au d´

epˆ

ot et `

a la diffusion de documents

scientifiques de niveau recherche, publi´

es ou non,

´

emanant des ´

etablissements d’enseignement et de

recherche fran¸

cais ou ´

etrangers, des laboratoires

publics ou priv´

es.

(2)

a p p o r t

d e r e c h e r c h e

0

2

4

9

-6

3

9

9

IS

R

N

IN

R

IA

/R

R

--7

5

7

1

--F

R

+

E

N

G

Domaine Audio, Speech, and Language Processing

Audio Inpainting

Amir Adler — Valentin Emiya — Maria G. Jafari — Michael Elad — Rémi Gribonval —

Mark D. Plumbley

N° 7571

(3)
(4)

Centre de recherche INRIA Rennes – Bretagne Atlantique

AmirAdler , ValentinEmiya, MariaG. Jafari, Mihael Elad , RémiGribonval,Mark D. Plumbley

Domaine: Équipe-ProjetMetiss

Rapportdereherhe n°7571Marh 16,201124pages

Abstrat:

We propose the Audio Inpainting framework that reoversaudio intervals distortedduetoimpairmentssuhasimpulsivenoise,lipping,andpaketloss. Inthisframework,thedistortedsamplesaretreatedasmissing,and thesignal is deomposed into overlappingtime-domainframes. Therestoration problem is then formulatedasan inverse problem peraudioframe. Sparse representa-tionmodelingisemployedperframe,andeahinverseproblemis solvedusing theOrthogonalMathingPursuitalgorithmtogetherwithadisreteosineora Gabor ditionary. Theperformaneofthis algorithmisshownto be ompara-ble orbetterthan state-of-the-artmethodswhen bloks ofsamplesof variable durationsaremissing. Wealsodemonstratethatthesizeoftheblokofmissing samples, rather than the overall number of missing samples, is a ruial pa-rameterforhighqualitysignalrestoration. Wefurtherintrodueaonstrained MathingPursuitapproahforthespeialaseofaudiodelippingthatexploits thesignpatternoflippedaudiosamplesandtheirmaximalabsolutevalue,as wellasallowingtheusertospeifythemaximumamplitudeofthesignal. This approah is shown to outperforms state-of-the-artand ommerially available methods foraudiodelipping.

Key-words: Inpainting,lipping,sparserepresentation,mathingpursuit.

A.AdlerandM.EladarewiththeComputerSieneDepartment,TheTehnion,Haifa 32000, Israel. V.EmiyaandR.GribonvalarewithINRIA,CentreInriaRennes-Bretagne Atlantique,35042RennesCedex,Frane. M.G.JafariandM.D.PlumbleyarewithQueen MaryUniversityofLondon,CentreforDigitalMusi,DepartmentofEletroniEngineering, LondonE14NS,U.K.,(e-mail:maria.jafariele.qmul.a.uk).

This workhas been submittedto IEEE Transations onAudio Speeh and Language Proessing. Partofthisworkhasbeenpresented atthe IEEEInternational Confereneon Aoustis,SpeehandSignalProessing(ICASSP)in2011[1℄.

ThisworkwassupportedbytheEUFramework7FET-Openprojet FP7-ICT-225913-SMALL:SparseModels,AlgorithmsandLearningforLarge-Saledata.

(5)

Résumé: Nousintroduisonsleoneptd'InpaintingAudiopourlarestauration deportionsdedonnéesaudiodistorduespardesdégradationstelsqueleslis, lelippingoulapertedepaquets. Danseontexte,lesdonnéesdistorduessont onsidéréesommemanquantesetlesignalestdéomposédansledomaine tem-porelen trames. Le problèmederestaurationest formuléommeunproblème inversedanshaquetrame. Celle-iestmodéliséeparunereprésentation pari-monieuseetleproblèmeinverseestrésoluvial'algorithmeOrthogonalMathing PursuitenutilisantunditionnairedeosinusdisretoudeGabor. Les perfor-manesobtenuessontomparablesàl'étatdel'art,avedesblosd'éhantillons manquants de durée variable. Nous montrons également que la qualité de la restauration dépend davantagede lataille desblos d'éhantillonsmanquants que du nombre total d'éhantillons manquants. Nous introduisons enn un algorithme de type Mathing Pursuit ave ontraintes pour le as partiulier dudelippingaudio,danslaquellesontexploitéeslespropriétésd'amplitudedes éhantillonssaturés: signe,amplitudeminimumetmaximum. Lesperformanes obtenuessontsupérieuresàellesdel'étatdel'artetàdelogiielsommeriaux pourledelipping.

Mots-lés : Inpainting, lipping, représentations parimonieuses, mathing pursuit.

(6)

0

0.01

0.02

0.03

−1

0

1

Time (s)

Amplitude

(a) Speeh signal orrupted by liks (ir-les).

0

0.01

0.02

0.03

−1

0

1

Time (s)

Amplitude

(b)Clippedversion(blak)ofaspeehsignal (gray).

()Theimageinpaintingproblem: reov-eryofloally-hiddenpixels.

Figure1: Examplesofrestorationproblemsrelatedtoinpainting.

1 Introdution

Speeh and musi signalsare often subjetto loalized distortions, where the intervalsof distorted samples are surrounded by undistorted samples. Exam-ples inlude impulsive noiseorliks(see Fig.1a), lipping (seeFig. 1b), CD srathes,paketlossin ordlessphonesorVoieoverIP (VoIP)andmore. In suhsituations,thedistorted samplesanbetreatedasmissing. Arestoration algorithm is employedto reonstrut the missing samples,in a similarwayas for image inpainting(see Fig.1). However,in the audioeld,suh problems have been treated separately and depending on the ontext, they have been referredtoasaudiointerpolation[26℄,extrapolation[3,7,8℄,imputation[9,10℄, indution [11℄,(bandwidth)extension[1215℄oronealment[16,17℄.

Substantialeorthasbeenfoused onthe restorationof audiosignals or-rupted by liksdue to old reordingsorsrathed CDs (seeFig.1a). In this problem,intervalsoforruptedsamplesfrom

20

µ

sto

4

ms[4℄ourat ran-domloations. Typialapproahesemployautoregressive(AR)modeling[2,3℄, or Bayesian estimation to reover the orrupted samples [4℄. Other methods utilize neuralnetworks[18℄ or sinusoidalmodeling[5,8℄. A relatedproblem is automati speeh reognitionin the presene of isolated noisy samples. This problem istreatedin [10℄ withaompressivesensingapproah in the spetro-gramimage domain,andbysolvingan

l

1

regularizedleastsquaresproblem.

(7)

Anotherimportant thoughless often addressed problem is audio lip-ping[6,7,19℄,whihreferstothetrunationofthewaveformbeyondathreshold when the maximum range in an aquisition systemis exeeded (see Fig. 1b). Thelippedsamplesarearrangedingroupsandtheirloationsaredetermined bytheamplitudeofthesignal(ratherthanbeingrandomlyspread). The delip-ping problemispartiularly hallengingfor thisreasonandastheinformation arriedbythehighest-amplitudesamplesisompletely absent.

Long intervals of samples may be lost during transmission over ordless phones orin VoIP systems, where the problem is addressedusing paket loss onealmentalgorithms [16,17℄. Missing intervals lengthsare in the range of

5

msto

60

ms,whiharelosetothetypialdurationforthepseudo-stationarity of audiosignals. Thelowlateny requirement in theVoIP ase resultsin rel-ativelysimple algorithms; however,estimating missingpakets in peer-to-peer repositories is a new appliation where higher quality reonstrution an be expeted(as thelatenyrequirementislessstringent).

Finally, the unreliable or missing audio data an be time-frequeny re-gions[5,9,11,14,20℄,inlassiationappliationslikeautomatispeeh reogni-tion[9,20℄orsoureseparationwithtime-frequenyloalizedinterferenethe phraseaudio inpainting hasbeen usedone in thisspei ase [11℄. Band-width extension [1215℄ is another important time-frequeny-domain applia-tion,wherehighfrequenyontentisestimatedfromthelowfrequenyontent in ordertoprovidehighqualityaudio.

Inthispaper,wepresentauniedframeworkfortherestorationofdistorted audiodata,leveragingtheoneptofImageInpainting [2123℄. Intheproposed framework, termed Audio Inpainting, the distorted data is assumed missing and its loation is assumed to be known a-priori. We further employ Sparse Representations(SR),whihhavebeendemonstratedtofaithfullymodelaudio signals[24,25℄andto addresstheimageinpaintingframework[22,26,27℄. The proposedapproahisdiretlybaseduponthosepriorworks.

Theontributionsofthispaperarefour-fold:

a) Audio inpainting isdened asan inverse problem, basedupon theonept ofimageinpainting.

b) Aframeworkforaudioinpaintingin thetimedomain isproposed,basedon sparserepresentations. It exploits twopossibleditionaries(disrete osine andGabor)knownto provideauratesparsemodelsforaudiosignals. ) TheOrthogonalMathingPursuit (OMP)algorithmforaudioinpaintingis

adapted,in partiulartodealwiththepropertiesoftheGaborditionary. d) Aonstrainedmathingpursuitapproahisappliedtosigniantlyenhane

theperformaneforaudiodelippingproblems.

Thispaperisorganizedasfollows. InSetion2,audioinpaintingis formal-izedasaninverseproblem. TheproposedframeworkisintroduedinSetion3 inludingthesparsemodelsused fortime-domain audioinpainting. The adap-tation ofthe OMP algorithm foraudio inpaintingin the timedomain and for audiodelippingispresentedinSetion4. Severalexperimentsareproposedin Setion 5,whilewedisussourndingsanddrawonlusionsinSetion6.

(8)

2 Audio Inpainting Problem Statement

We dene audio inpainting as a general problem enountered in many appli-ations: oneobserves apartial set of reliable audio data while the remaining unreliabledataiseithertotallymissingorhighlydegraded;theunreliabledata isonsideredmissing anditisestimatedfromthereliabledataportion.

The general formulation of audio inpainting is given in Setion 2.1 while severalpartiulartime-domainasesaredetailedin Setions2.2and2.3.

2.1 Formulation of audio inpainting We onsider a vetor

s

R

L

of audio data and an a-priori known partition

{

I

m

, I

r

}

of the support

I

,

{1

,

2

,

· · ·

, L

}

of

s

:

I

m

I

and

I

r

,

I

\

I

m . We assume that the oeients

s

(

I

m

)

are either missing or masked by a severe distortion. Thus, theobserveddata

y

R

L

oinides with

s

on

I

r

only. The audio inpainting problem is dened as the reovery of the oeients

s

(

I

m

)

basedontheknowledgeof: 1. thereliabledata

y

r

,

y

(

I

r

) =

s

(

I

r

)

, 2. thepartition

{

I

m

, I

r

}

,

3. additionalinformationabouttheobservedsignal,

4. and, optionaly, informationabout themissing data (see e.g. in thease oflippingbelow).

Inmatrixform,thereliabledata

y

r

resultfromthelinearmodel

y

r

=

M

r

s

,

(1)

where

M

r

istheso-alledmeasurementmatrixobtainedfromthe

L

×

L

identity matrix

I

L

byseletingtherows

I

r

assoiatedwiththereliableoeientsin

s

. In a similar way, the missing data to be reoveredare

s

(

I

m

) =

M

m

s

, where

M

m

onsistsoftherows

I

m

in

I

L

.

Inthegeneralaudioinpaintingframework,audiodataanbeeithersamples in waveformsor oeientsin transforms liketime-frequeny representations. The problemformulation aboveanbe usedfor multi-dimensionalsignalslike multihannelwaveformsortime-frequenyoeients,bysimplyreshapingthe signalmatrixasavetor

s

.

Intherestofthispaper,weonlyonsidertheinpaintingofmissingsamples in a single-hannel waveform. The multi-dimensional ase is disussed in the onlusion(seeSe.6).

2.2 Inpainting samples distorted by impulsive noise Inthepartiularaseofasignalorruptedbyimpulsivenoisesuhasliks(see Fig. 1a),

I

m

is aset of integersbetween

1

and

L

and must be estimated in a preliminarystage. Oneoftenonsidersthatthedistortedsamplesareorrupted byaGaussiannoise

n

withhighvariane. Hene,theompleteobservedsignal inludes boththereliablesamples

y

r

anddistorted ones

y

m :

(

y

r

=

M

r

s

y

m

=

M

m

s

+

n

,

(2)

(9)

wherethesamples

M

m

s

in

y

m

aremaskedby

n

sothattheyareonsideredas unknown.

2.3 Inpainting intervals of missing samples

In the ase where intervals of samples are missing, due to paket lossduring transmissionor tomaskingbyaudible interferenes,

I

m

isomposedof groups of onseutive integers: the samples

s

(

I

m

)

are totally missing and one only observes

y

r

=

M

r

s

.

Intheaseoflippedsignals,thesamplestobeestimatedarealsoarranged in intervals of onseutive samples, as depited in Fig. 1b. Their loations dependontheamplitudeofthesignal,suhthat

I

m

,

{

n

|1

n

L,

|

s

(

n

)| ≥

θ

lip

}

,

(3) where

θ

lip

is the lipping level. One observes both the un-lipped, reliable samples

y

r

andthelipped, maskedsamples

y

m

(

y

r

=

M

r

y

=

M

r

s

y

m

=

M

m

y

=

M

m

sign (

s

)

θ

lip

,

(4)

where

sign (·)

is theelement-wisesign funtion. As presented in thenext se-tions, the information provided by

y

m

, even though very rude a sign (per sample)andthelippinglevel,stillsubstantiallyenhanestheestimation per-formane.

3 Time-domain framework and models

Theproposed frameworkfousesontime-domainaudioinpainting. It relieson aframe-basedproessing,asdetailedinSetion3.1andonthesparse represen-tationsmodelingofaudiosignals,aspresentedin Setion3.2. Twoditionaries usedin thismodelingareintroduedinSetion 3.3.

3.1 Frame-based proessing and reonstrution Asin manyaudioproessingtasks,thesignalisloally proessed:

ˆ bysegmentingitintoframes,

ˆ byindependentlyinpaintingeah frame,

ˆ andbysynthesizingthefull restoredsignalusing theoverlap-add(OLA) method[28℄.

Wedeomposethesignalintooverlappingframesindexedby

i

,startingattime

t

i

andweightedbyananalysiswindow

w

a

withlength

N

. Bystraightforwardly adapting to the loal frames theproblem statementdened for thefull signal in Setion2,thereliablesamplesin frame

i

anbewrittenas

y

r

i

=

M

r

(10)

where

M

r

i

isthemeasurementmatrixofthe

i

-thframeobtainedfrom

M

r

and

s

i

(

t

)

,

s

(

t

+

t

i

)

w

a

(

t

)

isthe windowedframe dened for

0

t

N

1

. We alsodenethesupports

I

r

i

and

I

m

i

ofthereliablesamplesandofthemissingor maskedsamples,respetively. One theestimation

b

s

i

of

s

i

bysomeinpainting algorithmisahieved,thereonstrutionof thefullsignalisobtainedas

b

s

(

t

)

,

X

i

w

s

(

t

t

i

)

b

s

i

(

t

t

i

)

(6)

where

w

s

is the synthesiswindow suh that

P

i

w

s

(

t

t

i

)

w

a

(

t

t

i

) = 1

,

t

.

Intheproposedapproahes,weutilized

64

ms-frameswith

75%

overlap,a ret-angularwindowfor

w

a

andasine windowfor

w

s

.

3.2 Sparse Representations modeling of audio frames IntheSparseRepresentations(SR)modelingframework[23℄,itisassumedthat eahframeiswellapproximatedbyasparselinearombinationoftheolumns ofa(possiblyoveromplete)ditionary:

s

i

Dx

i

,

(7)

where

D

R

N

×

K

D

is theditionary,

N

K

D

and

x

i

R

K

D

×

1

is the repre-sentationvetorofthe

i

-thframe.

x

i

isassumedtobesparse,i.e. tohavefew non-zerooeientsompared to

N

. Asaonsequene,weanalsoutilizethe SRmodelfortheobservedreliablesamplesin eahframe

y

r

i

,

M

r

i

s

i

M

r

i

Dx

i

.

(8)

We propose to reover the unknown samples

s

i

(

I

m

i

)

by estimating as

x

ˆ

i

the(sparse)representationvetorofeah frame,givenonlytheleanobserved samples(8)andlimitedsideinformation(forthelippingase)

b

s

i

(

I

m

i

) =

M

m

i

D

x

ˆ

i

.

(9)

This formulation inluding the notion of sparsity was rst introdued for image inpainting [22℄ with a global treatment with global transforms. Then, eortsweredediatedtoworkonloalpathessimilartoaudioframesand to introdue alearned ditionary to improve the inpainting results [26℄; they have been improved [27℄ by modeling betterthe problem and by learningthe ditionarydiretlyfromtheorruptedimage.

3.3 Ditionaries

We propose two optionsto hoose aditionary

D

in whih audio signals are sparse: theDisreteCosineTransformditionary,andaGaborditionary. Both are widely used for sparse models of audio signals [24,25,29℄. Other xed ditionaries suh asmultisale DCT [30℄,orlearnedditionary [26℄ spei to partiularinpaintingtasksmayalso beinterestingoptions.

(11)

3.3.1 DisreteCosine Transform (DCT) ditionary

TherstoptiononsistsinawindowedDisreteCosineTransform(DCT) over-omplete ditionary

D

c

=

£

d

c

0

, . . . ,

d

c

K

c

1

¤

, atom

j

being dened for

0

j

K

c

1

and

0

t

N

1

as

d

c

j

(

t

)

,

w

d

(

t

) cos

µ

π

K

c

µ

t

+

1

2

¶ µ

j

+

1

2

¶¶

(10)

where

K

c

isthesizeoftheDCTditionaryi.e. thenumberofdisrete frequen-ies and

w

d

isaweightingwindowset bytheuser. Thishoieis motivated bythewideuseofwindowedDCTatomsforsparserepresentationofaudio sig-nals[25℄. However,thezerophaseof

D

c

atomsisnotadaptedto audiosignals thataremadeupwithsinusoidalomponentswithinitialphasedistributed be-tween

0

and

2

π

. As aonsequene,theDCTmodelatsasabasisratherthan asasynthesismodelandthesignalsarenotreallysparsein

D

c

. 3.3.2 Gaborditionary

Theseondoptionaimsatsparselymodelingarbitrary-phasesinusoidal ompo-nentsbyusingaGaborditionary

D

g

=

n

d

g

(

j,ϕ

)

o

(

j,ϕ

)

Γ

inwhihtheatomsare index byaontinuousset

Γ

,

J

0

, K

g

1

K

×

[0

,

2

π

[

andaredenedas

d

g

j,ϕ

,

w

d

(

t

) cos

µ

π

K

g

µ

t

+

1

2

¶ µ

j

+

1

2

+

ϕ

,

(11)

where

K

g

isthesizeoftheGaborditionary.

Notethat in theurrentaseofaontinuously-indexed ditionary,eq. (7), (8)and(9)arestillvalid ifwedene

D

g

x

i

=

X

(

j,ϕ

)

Γ

x

i

(

j,ϕ

)

6

=0

d

g

j,ϕ

x

i

(

j, ϕ

)

(12) where

x

i

=

{

x

i

(

j, ϕ

)}

(

j,ϕ

)

Γ

. Indeed, eq.(12)is anite sumsineonly afew oeientsinthesparserepresentationvetor

x

i

arenon-zero. Thealgorithmi aspetsofthisdeompositionwillbeaddressedinSetions4.2and4.3.

4 Audio inpainting algorithms based on Orthog-onal Mathing Pursuit

Foragiven ditionary

D

, we usethe OrthogonalMathingPursuit algorithm toperformtheinpaintingofanaudioframe,aspresentedin Setion4.1. Some ditionary-dependentalgorithmistagesarethendetailedinSetion4.2and4.3. An extension of thealgorithm spei to delipping is nally detailledin Se-tion4.4.

(12)

Table1: OMPInpaintingAlgorithm

y

r

i

,

M

r

i

,

D

=

{

d

j

}

j

Γ

, K

OMP

, ǫ

OMP

i

Initialization: ˆ Ditionary

D

e

=

n

e

d

j

o

j

Γ

=

M

r

i

×

D

×

W

, where

W

jj

= 0

for

j

6=

j

and

W

jj

=

k

M

r

i

d

j

k

2

1

. ˆ Iterationounter

k

= 0

ˆ Supportset

0

=

ˆ Residual

r

0

=

y

r

i

Sparse support seletionand oeientsestimation: Repeatuntil

k

=

K

OMP or

k

r

k

k

2

2

< ǫ

OMP

i

ˆ Inrementiterationounter

k

=

k

+ 1

ˆ Selet atom: nd

j

= arg max

j

Γ

|

<

r

k

1

,

d

f

j

>

|

(14)

ˆ UpdateSupport

k

= Ω

k

1

j

ˆ Updateurrentsolution

x

k

= arg min

u

k

y

r

i

D

e

k

u

k

2

(15) ˆ UpdateResidual

r

k

=

y

r

i

D

e

k

x

k

Output:

x

ˆ

i

=

Wx

k

4.1 Orthogonal Mathing Pursuit (OMP) algorithm for inpainting

Theapproahemergesfromthefollowingoptimizationproblem

ˆ

x

i

= arg min

x

k

x

k

0

s.t.

k

y

r

i

M

r

i

Dx

k

2

2

ǫ

i

.

(13)

foragivenapproximationerrorthreshold

ǫ

i

. The

l

0

pseudo-norm

k

x

k

0

ountsthenon-zerosomponentsofthevetor

x

, leading to an NP-hard problem [31,32℄. Therefore, a diret solution of (13) is infeasible. An approximate solution is given by applying the Orthogonal Mathing Pursuit (OMP) algorithm [24,33℄, whih suessivelyapproximates thesparsestsolution. TheinpaintingOMPalgorithm[23℄,detailedin Table1, is aslightly modiedversionof thelassialOMP algorithmin thesense that allditionaryolumns

d

e

j

areinternallynormalizedtounitnorm,usingdiagonal matrix

W

,duetotheavailabilityofonlytheleansamples. Thealgorithmstops iteratingassoonaseither theresidualenergydrops belowthethreshold

ǫ

OMP

i

orthemaximumsparsitylevel

K

OMP

(13)

4.2 Atom seletion

When using the DCT ditionary, the algorithmi stage for the atom sele-tion (14) at eah iteration is well known: it onsists of expliitly omputing theorrelationmentionedin eq.(14)ofTable1,orofusingafasttransform.

However, the atom seletion (14) needs explaining in the ase of the Ga-bor ditionary in order to deal with the ontinuous indexing. Without any approximation,the deomposition withontinuously-indexedatoms

d

g

anbe expressedusingpairsofatomsinadisreteditionarywith

K

g

frequenybins. Pairsofatomsanbeeitheronjugateomplexexponentials[24,34℄,orpairsof osine and sineat thesamefrequenyand with azerophase[29℄. Inorderto usethislatteroption,weintroduesineatoms

d

s

j

as

d

s

j

(

t

)

,

w

d

(

t

) sin

µ

π

K

g

µ

t

+

1

2

¶ µ

j

+

1

2

¶¶

(16)

andwedenetheunit-normversion

d

e

c

j

and

d

e

s

j

oftheatoms

d

c

j

and

d

s

j

, respe-tively,asdesribedinTable1.

Ateah iteration

k

,seletingthebest orrelatedGaboratom

d

g

j

(eq.(14)) isthenequivalenttopikingthepair

³

e

d

c

j

,

e

d

s

j

´

suhthat

j

= arg min

j

∈J

1

,K

D

/

2

K

°

°

°

r

k

1

d

e

c

j

x

b

c

j

d

e

s

j

b

x

s

j

°

°

°

2

2

(17) where

b

x

c

j

=

­

e

d

c

j

,

r

k

1

®

­

e

d

c

j

,

e

d

s

j

®­

e

d

s

j

,

r

k

1

®

1

­

e

d

c

j

,

e

d

s

j

®

2

b

x

s

j

=

­

e

d

s

j

,

r

k

1

®

­

e

d

c

j

,

e

d

s

j

®­

e

d

c

j

,

r

k

1

®

1

­

e

d

c

j

,

e

d

s

j

®

2

.

(18)

This partiular seletion stage hasbeen proposed in [34, Appendix II℄ for onjugate Gaborhirpatoms and the use of bloks of oherentatoms in MP andOMPhasbeenfurtherstudiedin[35℄. Intherestritedasewhereatomsin aandidatepairareunorrelated(i.e.

D

e

d

c

j

,

d

e

s

j

E

= 0

),eq.(17)anbesimplied as

j

= arg max

j

∈J

1

,K

D

/

2

K

D

e

d

c

j

,

r

k

1

E2

+

D

e

d

s

j

,

r

k

1

E2

andtheresultingalgorithm is equivalent to the existing Modied Mathing Pursuit [36℄ and Blok OMP algorithms[37℄.

4.3 Solution update

When using the DCT ditionary, the solution update (15) performed at eah iterationusuallyonsistsofaleast-squareprojetion.

In the ase of the Gabor ditionary, one the best atom has been added to the set of atoms seleted in previous iterations, the update of the urrent solution(15)anbeperformedby aleast-squareprojetionusing theseleted Gabor atoms

n

d

g

j,ϕ

j

o

j

k

(14)

However,this update an be improved by using the equivalent osine and sineatoms

©

d

c

j

,

d

s

j

ª

j

k

intheleast-squareprojetion:notonlyamplitudesbut alsophasesarethusupdatedateahiteration,leadingtoabetterapproximation ofthesignal.Asfarasweknow,suhanimplementationofOMPwithaGabor ditionaryhasneverbeenproposed before.

4.4 Algorithmi enhanements for inpainting lipped sig-nals

4.4.1 The `min'delippingonstraint

Inpainting lipped signals an be performed with the algorithm presented in Setion 4.1,bytreatingthelipped samplesasompletelyunknown. However, extrainformationinherenttothisproblemanbeintegratedasadditional on-straints into equations (13). Constrained optimization approahes were also utilizedin theaseof

l

1

-minimizationforimagedesaturation[38℄ andofaudio delipping basedonaband-limitedassumption[7℄.

Let

θ

lip

bethelippinglevel(whihanbeeasilyestimatedasthemaximum absolute value among the observed samples) and

M

m+

i

(resp.

M

m-i

) be the matrixsuhthat

M

m+

i

s

i

(resp.

M

m-i

s

i

)isthevetorofpositive(resp. negative) lippedsamples. Thematries

M

m+

i

and

M

m-i

areknownfromtheloationand the signof thelipped samples. Themissing samples should satisfythe `min' onstraints

M

m+

i

s

i

θ

lip and

M

m-i

s

i

≤ −

θ

lip

.

(19) 4.4.2 The `max'delippingonstraint

The set of `min' onstraints an be further augmented by `max' onstraints, introduinganupperlimitontheabsolutevalueofthereoveredsamples

θ

max , asfollows

M

m+

i

s

i

θ

max and

M

m-i

s

i

≥ −

θ

max

.

(20) The upper limit

θ

max

is an optional parameter that annot be estimated automatially in astraightforwardwaybut maybe adjusted manuallyby the user.

4.4.3 The `minmax'onstrained SRproblem

Using bothsets ofonstraints,the`minmax' delipping versionofthe

l

0

-norm minimization problem(13)isgivenby

ˆ

x

i

= arg min

x

k

x

k

0

s.t.

k

y

r

i

M

r

i

Dx

k

2

2

ǫ

i

θ

max

M

m+

i

Dx

θ

lip

θ

max

M

m-i

Dx

≤ −

θ

lip (21) where

θ

max

(15)

Table2: Summaryoftheproposedalgorithms: eahrowindiatesthealgorithm usage(generalinpaintingordelipping),dependingonpossibleadditional on-traints,while ditionaries vary aross olumns. Algorithm nomenlature ap-pearswithinquotesin eahell.

Additionalspeiation DCT Gabor

Inpainting `OMP-C'[Table1℄ `OMP-G' [Table1℄ Min-onstraint delip-ping `OMP-C-min' [Ta-ble1+eq.(22)℄ `OMP-G-min' [T a-ble1+eq.(22)℄ Minmax-onstraint de-lipping `OMP-C-minmax' [Ta-ble1+eq.(23)℄ `OMP-G-minmax' [T a-ble1+eq.(23)℄

4.4.4 OMPdelipping algorithm

Weproposeapproximatesolutionsbyinorporatingtheonstraints(19)and(20) intothenalsolutionupdatestageoftheOMPInpaintingalgorithm. Inother words, the OMP Inpainting algorithm presented in Table1 is applied, in or-der to selet the sparse support. One thesupport

k

is seleted, the sparse representationoeientsarere-estimatedbysolvingthefollowingonstrained optimizationproblem:

x

k

= arg min

u

k

y

r

i

D

e

k

u

k

2

s.t.

(

M

m+

i

DWu

θ

ˆ

lip

M

m-i

DWu

≤ −

θ

ˆ

lip (22)

in theaseofthe`min'onstraint,or

x

k

= arg min

u

k

y

r

i

D

e

k

u

k

2

s.t.

(

ˆ

θ

max

M

m+

i

DWu

θ

ˆ

lip

θ

ˆ

max

M

m-i

DWu

≤ −

θ

ˆ

lip (23)

fortheaseofthe`minmax'onstraint.Theonstraintsarelinear,thusstandard onvexoptimizationsolversanbeemployed.

Intheory, the solution of the onstrained problem maynot exist. We ob-servedthatthisoursveryseldominpratie. Whenevernosolutionexists,the frameisrestoredusingtheunonstrainedminimization

x

k

= arg min

u

k

y

r

i

D

e

k

u

k

2

.

5 Experimental Results

A summary ofallversions ofthealgorithm presentedin this paperis givenin Table2. This setionreportsthe major trendsthrough dierentexperiments. TheperformanemeasuresareintroduedinSetion5.1. Thetestmaterialand parameter settings are presented in Setions 5.2 and 5.3. The global perfor-maneofalltheproposedinpaintingalgorithmsandamoredetailedinpainting experimentarepresentedinSetion 5.4. Setion5.5nally fousesonthease oflipping.

(16)

5.1 Performane measures

Theperformaneanbeassessedbythesignal-to-noiseratio(SNR) omputed onthefullsignals,dened by

SNR full

(

s

,

b

s

)

,

10 log

k

s

k

2

2

k

s

b

s

k

2

2

.

(24) WhileSNR full

givesanoverviewoftheglobalqualityoftherestoredsignal, itanbedeomposedas SNR full

(

s

,

b

s

) =

SNR

m

(

s

,

b

s

) + 10 log

k

s

k

2

2

k

s

(

I

m

)k

2

2

(25) where SNR

m

(

s

,

b

s

)

,

10 log

k

s

(

I

m

)k

2

2

k

s

(

I

m

)

b

s

(

I

m

)k

2

2

.

(26)

SNR

m

reetsthereonstrutionperformaneperestimatedsampleand dif-fersfromSNR

full

byanosetthatdoesnotdependontheinpaintingalgorithm. Indeed,theseondtermin Eq.(25)isabiasthat reetsthedegradationrate only. Thus, SNR

m

will bepreferredto showsomedetailed performane, with-out the inuene of this bias, while SNR

full

will be used to assess the global restorationquality.

Note that in theontext of a pereptually-motivated evaluation of the re-sults,SNRmeasuresmaybereplaedbysoresfromlisteningtestsorby obje-tivemeasures. Totheauthors'knowledge,existingsubjetivetestprotoolsand objetivemeasuresforaudioqualityassessmentarenotdediatedtotheev alua-tionofaudioinpainting. Indeed,theygenerallyapplytosignalsthatsuerfrom globaldegradationratherthanloalones,inappliationssuhasoding[39,40℄ orsoureseparation[41℄. Thus,workingontheevaluationof audioinpainting isanimportantfuturediretionto onsider.

5.2 The olletion of tested signals Theexperimentsareonduted usingthreedatasets:

ˆ Musi16kHz: aset of musi signals sampled at 16 kHz, this sampling ratebeingagoodtrade-o betweenaudioquality andomputational re-quirements.

ˆ Speeh16kHz: asetofspeehsignalssampledat16kHz,i.e. highquality speeh for whih resultsan be ompared to theprevious ase of musi signals.

ˆ Speeh8kHz: a set of speeh signals sampled at 8 kHz, representing phone-qualityspeeh;thisdatasetwasobtainedbydownsamplingthe pre-vious16kHzspeehdataset.

(17)

Eah dataset is omposed by ten 5-seonds signals from the 2008's Signal Separation Evaluation Campaign [42℄ and are freely available online

1

. They inludealargediversityofaudiomixturesandisolatedsoures: maleandfemale speeh from dierent speakers, singing voie, pithed and perussive musial instruments.

Inorderto haveomparabledegradationsamongallsignalsinthelipping experiments(Setion 5.5),eahoriginal signalisnormalizedso that the maxi-mumamplitudeis1.

5.3 Parameter settings

A spei training datasetwasused to tune the parametersof the inpainting algorithms manually and without ne adjustment. The values of the tuned parametersareshowninTable3.

Table3: Parametersettings Parameter Value

Framelength 64ms(i.e.

N

,

512

at8kHz,

N

,

1024

at16kHz)

FrameOverlap

75%

Analysiswindow

w

a

retangular Synthesiswindow

w

s

sine

Ditionarysize

K

c

= 2

N

(DCT),

K

g

=

N

(Gabor) Atomweightingwindow

w

d

retangular(

w

d

=

w

a

)

ǫ

OMP

i

ǫ

×#

I

r

i

where

ǫ

,

10

6

isaxed param-eter and

#

I

r

i

is the number of reliable samplesin the

i

thframe

K

OMP

N

4

ˆ

θ

lip

k

y

k

ˆ

θ

max

θ

lip 5.4 Inpainting experiments

5.4.1 Globaleet ofthe duration of missingintervals

Theinpaintingperformaneoftheproposedalgorithmswasevaluatedfor vari-able durations of missing intervals of samples. Eah experiment tested the performanewiththeentireolletionofsignals,foraxedmissinginterval du-rationthatrepeatedperiodiallyevery100ms. Themissingintervalsdurations wereintherangeofafrationof1ms(orrespondingtoimpulsivenoiseorliks distortions)andupto10ms(orrespondingtopaketlosssenarios).

Foromparison,weusedthemethodbyJanssen[2℄basedonlinearpredition andareonstrutionmethodbasedonsplineinterpolationtheMatlab`interp1'

1

(18)

10

0

10

1

10

20

30

40

50

Speech@8kHz

Missing Interval Duration (ms)

SNR

full

(dB)

Click

removal

Packet loss

concealment

10

0

10

1

10

20

30

40

50

Speech@16kHz

Missing Interval Duration (ms)

SNR

full

(dB)

10

0

10

1

10

20

30

40

50

Music@16kHz

Missing Interval Duration (ms)

SNR

full

(dB)

Figure 2: Performane of inpainting algorithms as afuntion of the duration of missing intervalsfor eah dataset(subgures). The missing intervals were generatedperiodiallyevery100ms(atotalof50equaldurationmissingintervals persignal).

funtion. Thesemethodsarerepresentativesofthetwomainfamiliesof state-of-the-artmethodsforinterpolationofaudiodataandareabletohandlemultiple bloksofonseutivemissing samples. InJanssen'smethod, theautoregressive modelorderisset to

3

N

miss

+ 2

,where

N

miss

isthenumberofmissingsamples in theurrentframe, asreommendedbytheauthors.

The results are presented in Fig. 2. On the average, the OMP algorithm with the Gabor ditionary provides an advantage of 1-2dB ompared to the OMP with DCTditionary. ForMusi16Khz and Speeh8Khz these algo-rithms also outperform Janssen's approah for short missing intervals of du-rations up to 1ms. Fordurationsabove 1ms Janssen's approah provides an advantageof1-3dB.ForSpeeh16Khz,Janssen'smethodperformsbetterthan theproposedones,linearpreditionbeingpartiularlywell-adaptedforspeeh. However, using moreinformation with theproposed method an enhane the performane,aswillbeshowninthedelippingexperimentinSetion5.5. The spline interpolationapproahprovidessubstantiallyworseresultsforallases. 5.4.2 Fineeet of the `topology'ofthe missingsamples

The reoveryor approximation performane of sparse approahes is often as-sessed as afuntion of the sparsitydegree and the numberof observations in the ase of a random measurement matrix [43℄. However, thelatter assump-tion,reentlyhighlightedintheompressedsensingframework,doesnotholdin manyaudioinpaintingappliations: asintroduedinthepreviousexperiment, onemustdealwith bloksof onseutivemissing samples. In thisexperiment, wequestionthisassumption andassessempirialperformaneasafuntion of therandomnessandtheonseutivenessoftheloationofthemissingsamples. The maximum randomnessis ahievedwhen the missing samples are isolated and distributed aordingto e.g. a uniform law. Conversely, when they are grouped, themissing samplesin agiven blok arenot randomlyloated, even if thebloksthemselvesmay be randomlyloated. Hene the question: fora xednumberofmissingsamples,to whihextentistheinpaintingoffewlarge

(19)

N

miss = 12

N

miss = 36

N

miss = 60

N

miss = 120

N

miss = 180

N

miss = 240

10

0

10

1

10

2

10

20

30

40

50

Hole size (samples)

SNR

m

(dB)

Fixed parameters

10

0

10

1

10

2

10

20

30

40

50

Hole size (samples)

SNR

m

(dB)

Oracle parameters

10

0

10

1

10

2

2

4

6

8

10

12

14

Hole size (samples)

dB

Difference (Oracle−Fixed)

Figure3: PerformaneoftheOMP-Galgorithm,ontheSpeeh8kHzdataset, asafuntionoftheholesize,fordierentvaluesofthenumber

N

miss

ofmissing samplesinaframe: estimationbytheproposedalgorithmwithxedparameters (left),idealestimationwiththebeststoppingparameters

K

OMP

and

ǫ

seleted foreahobservedframe(enter)andperformanedierene(right). Theframe lengthis512samplesandtheholesizerangesfrom1sample(i.e.

0

.

12

ms,

0

.

2%

oftheframe)to240samples(i.e.

30

ms,

46

.

8%

oftheframe).

bloks a morediult problem than theinpainting of many small bloks (or isolatedsamples)?

Theexperimental protoolonsistsin thefollowingsteps: ˆ Chooseasetofframes

2

ˆ Fixthenumberofmissing samples

N

miss ; ˆ Foreah

(

a, b

)

N

2

suhthat

a

×

b

=

N

miss ; Foreahframeintheset,

* Randomlygenerate

a

holeswithlength

b

;

* Reover the samples inside the holes from the samples outside theholes;

* ComputeSNR

m

;

AveragetheSNR

m

valuesw.r.t. allframes.

WeusetheOMP-Galgorithmto reoverthesamples. Thesetofvaluesfor the numberof missing samples

N

miss

is

{12

,

36

,

60

,

120

,

180

,

240}

, whih allow a large numberof fatorizations of the form

a

×

b

=

N

miss

,

(

a, b

)

N

2

. For eahtestpoint

(

N

miss

, a, b

)

,onethousand64msframesfromthe8kHzspeeh datasetareproessed.

ResultsarepresentedintheleftplotofFig.3. Whentheholesizeis1i.e. samplesarerandomlyanduniformlydistributed,thereoveryperformaneis

2

(20)

veryhighwithSNR

m

valuesabove

35

dB,inludingtheasewherethenumber of missing samples is high (e.g.

N

miss

= 120

). When holes get larger, the performane signiantly dereases: thus, inpainting a single 12-length hole happenstobeamuhmorediultproblemthaninpaintingaframewith120 isolated missing samples. Yet, apositive performane is still obtainedfor the largestholes(e.g. SNR

m

5

dB

at

N

miss

= 100

).

The sensitivity of OMP-G to the stopping riteria was measured thanks to an orale algorithm. It onsists in applying the OMP-G algorithm with dierentvaluesof

¡

K

OMP

, ǫ

¢

,and inseletingtheset ofparametersthatgives the best performane for eah frame independently. The tested parameters were

¡

K

OMP

, ǫ

¢

©

N

2

1

.

5

,

N

2

2

, . . . ,

N

2

4

.

5

ª

×

©

10

10

,

10

9

, . . . ,

10

1

ª

. Results are presentedintheenterplotofFig.3andthedierenebetweentheoraleand blind systems is shown in the rightplot. One ansee that xing parameters is a onvenient, simpleapproximation that leadsto suboptimal but satisfying performaneomparedtotheoralease. However,adaptingtheparametersto theframetoproessmaybeworthstudying: thedierenebetweenoraleand blindperformanerangesfrom4to10dBinmostofases,showingasigniant potentialforimprovement.

5.5 Delipping experiment

Clippingrestorationisillustratedin Fig.4whenthelippinglevelis

0

.

2

. Here, theOMP-C-minmaxalgorithmisappliedtoanexampleofmusisignal,where oneanobservethatthereonstrutedsamplesarelosetotheoriginalsignal.

0

0.01

0.02

0.03

0.04

0.05

−0.5

0

0.5

time (s)

Amplitude

Figure 4: Restoration ofa musisignal: original (lightgray),lipped (blak), estimatebytheOMP-C-minmaxalgorithm(darkgray).

Inalargerexperiment, someoftheproposedmethodsforrestoringlipped signals are tested on the 3 datasets Speeh8kHz, Speeh16kHz and Mu-si16kHz. Eah sound is artiially lipped with suessive lipping levels, from

0

.

2

upto

0

.

9

with a

0

.

1

-step. Forthisexperiment,weseletedthe OMP-C-minmax,OMP-G,OMP-G-minandOMP-G-minmaxalgorithmsaftertesting allthealgorithms,sinetheresultsprovidethemostinterestingonlusions(see below).

TheperformaneofthosealgorithmsarereportedinFig.5,andshowthat: ˆ Theuseof the`min'or`minmax' delipping onstraintresultsin alarge improvementoftheSNR,ontheaverageby

3

dBfor OMP-G.A similar improvement has been obtained in the ase of OMP-C. As in previous experiments,weseethatmethodsbasedonSR,ifeientunder random-measurementonditions [23℄,annot straightforwardly reover

(21)

partially-sampled signalswhen groupsof missing samples are involved. But they areexible enoughto integrateadditional onstraintsthat leadstohigh performane.

ˆ Theminmax-onstraintOMP-G-minmaxalgorithmreahesbetterresults than the min-onstraint OMP-G-min algorithm when the lipping level is

0

.

2

. This orresponds to therangewhere the approximate value

θ

ˆ

max is lose to the atual maximum value as well as to the most degraded signals. A lose analysis of the individual restored sounds reveals that largespikes are avoided thanksto the maximum value onstraint. In a pratialappliation,themaximumvalue

θ

ˆ

max

should beadjustedbythe useruntilthebestaudioqualityisahieved.

ˆ The omparison between OMP-C-minmax and OMP-G-minmax shows thattheinitial-phasemodelingbytheGaborditionarysigniantly im-provestheperformane,asalreadyobservedinSetion 5.4.1.

Performane omparison is obtained using two onurrent methods: the ClipFix Audaityplug-inbasedonubiinterpolation,theCute Studio Delip ommerialsoftware

3

andJanssen'smethod[2℄basedonlinearpredition. The OMP-G-minmaxalgorithm isompared againstthesemethods andresultsare showninFig.6. Ontheaverage,OMP-G-minmaxoutperformJanssen'smethod by

2

.

8

dB for the Speeh8kHz dataset; by

0

.

5

dB for the Speeh16kHz dataset; and by

3

dB for the Musi16kHz dataset. Lower performane is obtainedfromtheCuteStudioDelipsoftware,forwhihtheunderlying restora-tiontehniqueisunknown. TheClipFix plug-inreahespoorresults,belowall thereportedones.

6 Conlusions

Inthispaper,wehavepresentedtheAudioInpaintingframeworkasthegeneral problem of restoring distorted or missing audio data based on the available reliable data. We have dened Audio Inpainting as an inverse problem, and following from image inpainting approahes, we have proposed to use sparse representation methods to restore in thetime domain the audio samplesthat aredistortedormissing.

Using a frame-based proessing of the audio signal, we have adapted the OrthogonalMathingPursuit algorithmtoaddresstheAudio Inpainting prob-lem,witheitheradisreteosineorGaborditionary. Theperformaneofthis algorithm hasbeenshownto beomparable to orbetter thanstate-of-the-art methods whenbloksof samplesofvariabledurationsweremissing,andOMP withtheGaborditionaryhasbeenfoundtogivebetterresultsthanOMPwith DCTditionary. Moreover,ithasbeenshownthatthesizeoftheblokof miss-ingsamplesismoreruialforgoodsignalrestorationthantheoverallnumber ofmissingsamplestoestimate. Forthespeialaseofaudiodelipping,a on-strainedmathingpursuitapproahhasbeenapplied,thattakesintoaounta priorianduser-speiedknowledgeabouttheamplitudeoftherestoredsignal. This approahhasbeenshowntosigniantlyenhanetheperformaneofthe algorithm, whih also outperforms state-of-the-art andommerially available

(22)

OMP-G-minmax

OMP-G-min

OMP-G

OMP-C-minmax

0

0.5

1

−2

0

2

4

6

8

10

12

Music@16kHz

Clipping level

0

0.5

1

−2

0

2

4

6

8

10

12

Speech@16kHz

Clipping level

0

0.5

1

−2

0

2

4

6

8

10

12

Speech@8kHz

Average SNR

m

improvement (dB)

Clipping level

Clippinglevel

0

.

2

0

.

3

0

.

4

0

.

5

0

.

6

0

.

7

0

.

8

0.9 Speeh8kHz

6

.

2

dB

8

.

2

dB

10

.

1

dB

11

.

9

dB

13

.

7

dB

16

.

5

dB

19

.

2

dB 24.4dB Speeh16kHz

6

.

3

dB

8

.

4

dB

10

.

3

dB

12

.

2

dB

14

.

2

dB

16

.

8

dB

19

.

8

dB 24.8dB Musi16kHz

6

.

4

dB

9

.

1

dB

11

.

4

dB

13

.

4

dB

15

.

0

dB

17

.

0

dB

19

.

4

dB 25.1dB Figure5:AverageSNR

m

improvement,fromtheinitialSNRoflippedsignalsto

theSNRofrestoredsignals,asafuntionofthelippinglevel: foreahdataset i.e. eahsubgure,theperformaneispresentedasafuntionofthelipping level,forOMP-G, OMP-G-min,OMP-G-minmaxandOMP-C-minmax.

OMP-G-minmax

Janssen

ClipFix

CuteStudioDeClip

No declipping

0

0.5

1

5

10

15

20

25

30

35

Music@16kHz

Clipping level

0

0.5

1

5

10

15

20

25

30

35

Speech@16kHz

Clipping level

0

0.5

1

5

10

15

20

25

30

35

Speech@8kHz

Average SNR

m

(dB)

Clipping level

Figure6: AverageSNR

m

asafuntionofthelippinglevel: foreahdataset,the performaneispresentedasafuntionofthelippinglevel,forBOMP-minmax, forJanssen'sapproah[2℄andforthesplineinterpolation(Spline). Theinitial SNR ofthelippedsignalisalsoplotted(Clipped).

(23)

methods foraudiodelipping.

Based onthe audio inpainting framework and on the baselineresults pre-sentedinthispaper,anumberoffuturediretionsmaybeinvestigated. T ehni-ally,onemayomparetheOMP-basedmethodsto

l

1

-minimizationtehniques, known to be another family of approahes to deal with sparse models. They are theoretially eient but so far, we an only report preliminary perfor-manethat islowerthanwithgreedyalgorithmsforaudioinpainting. Another perspetive is the use of new sparse models for audio signals. In partiular, struturedsparsemodelsandlearnedditionaryarepromisingdiretions. From anappliation pointof view,time-frequenyaudioinpaintingis anew investi-gation eld for sparse approahes. Using the formulation of audio inpainting (see Setion 2.1) in thetime-frequeny domain, one must then introdue new ditionaries, targetting appliations likesoure separation and bandwidth ex-tension.

Referenes

[1℄ A.Adler,V.Emiya,M.Jafari,M.Elad,R.Gribonval,andM.D.Plumbley, A Constrained Mathing Pursuit Approah to Audio Delipping, in IEEEInt.Conf.onAoustis,SpeehandSignalProessing,Prague,Czeh Republi,May2011.

[2℄ A.Janssen,R.Veldhuis,andL.Vries,Adaptiveinterpolationof disrete-timesignalsthatanbemodeledasautoregressiveproesses,IEEETrans. Aoustis,Speeh andSig. Pro., vol.34,no.2,pp.317330,apr1986. [3℄ W. Etter, Restorationof a disrete-timesignal segmentby interpolation

basedon the left-sided and right-sided autoregressiveparameters, IEEE TransationsonSignalProessing,vol.44,no.5,pp.11241135,may1996. [4℄ S.J.GodsillandP.J.W.Rayner,DigitalAudioRestoration-AStatistial

Model-basedApproah. Springer-Verlag,1998.

[5℄ M.Lagrange andS. Marhand,Longinterpolationof audiosignalsusing linearpreditionin sinusoidalmodeling, Journal of the AudioEng. So., vol.53,pp.891905,2005.

[6℄ A. Dahimene, M. Noureddine, and A. Azrar, A simplealgorithmfor the restoration of lipped speeh signal, Informatia, vol. 32, pp. 183188, 2008.

[7℄ J.S.AbelandJ.O.Smith,Restoringalippedsignal,inIEEEInt.Conf. onAoustis,SpeehandSignalProessing,Toronto,Canada,May1991. [8℄ R. C.Maher, A method forextrapolation ofmissing digitalaudiodata,

in95th AESConvention, 1993.

[9℄ M. Cooke, P. Green, L. Josifovski, and A. Vizinho, Robust automati speeh reognition with missing and unreliable aousti data, Speeh Communiation,vol.34, no.3,pp.267285,2001.

(24)

[10℄ J.Gemmeke,H.VanHamme,B.Cranen,andL.Boves,Compressive sens-ingformissingdataimputationinnoiserobustspeehreognition,Seleted Topisin SignalProessing, IEEE Journalof, vol.4, no.2,pp.272287, 2010.

[11℄ J. Le Roux, H. Kameoka, N. Ono, A. de Cheveigné, and S. Sagayama, Computational auditory indution as a missing-data model-tting problemwithBregmandivergene, Speeh Communiation,vol.InPress, 2010.

[12℄ M.Dietz,L.Liljeryd, K.Kjorling,and O.Kunz,SpetralBand Replia-tion, anovel approah in audio oding, in Pro. of the 112th AES Con-vention. Munih, Germany: Audio EngineeringSoiety;1999,May2002. [13℄ E. Larsen and R. Aarts, Audio bandwidth extension: appliation of

psy-hoaoustis,signal proessingandloudspeakerdesign. Wiley,2004. [14℄ P. Smaragdis, B. Raj, and M. Shashanka, Missing data imputation for

spetralaudiosignals, inPro. of MLSP,Grenoble,Frane,Sep.2009. [15℄ M.Moussallam,P. Leveau,andS. M.AzizSbai, Soundenhanement

us-ingsparseapproximationwithspelets, inIEEE Int.Conf.onAoustis, SpeehandSignalProessing,Mar.2010,pp.221224.

[16℄ C.Perkins,O.Hodson,andV.Hardman,Asurveyofpaketlossreovery tehniquesforstreamingaudio, Network,IEEE,vol.12,no.5,pp.4048, sep.1998.

[17℄ H. Or, D. Malah, and I. Cohen, Audio paket loss onealment in a ombined mdt-mdst domain, Signal Proessing Letters, IEEE, vol. 14, no.12,pp.10321035,de.2007.

[18℄ G.CohiandA.Unini,Subband neuralnetworkspreditionforon-line audio signal reovery, IEEE Transations on Neural Networks,, vol. 13, pp.867876,2002.

[19℄ S. J. Godsill, P. J. Wolfe, and W. N. W. Fong, Statistial model-based approahesto audiorestorationand analysis, Journal of New Musi Re-searh,vol.30,no.4,2001.

[20℄ J. Barker, Computational auditory sene analysis: priniples, algorithms, andappliations. IEEEPress/Wiley-Intersiene,2006,h.Robust Auto-matiSpeehReognition,pp.297350.

[21℄ M.Bertalmio,G.Sapiro,V.Caselles,andC.Ballester,Imageinpainting, in Pro. of 27th Conf. on Computer graphis and interative tehniques. ACMPress/Addison-WesleyPublishingCo.,2000,pp.417424.

[22℄ M. Elad, J. L.Stark, P. Querre, and D. L. Donoho, Simultaneous ar-toonandtextureimageinpaintingusingmorphologialomponentanalysis (ma), Applied andComputational HarmoniAnalysis, vol. 19, pp.340 358,2005.

[23℄ M.Elad, Sparse and RedundantRepresentations- From Theory to Appli-ations inSignalandImage Proessing. SpringerNew-York,2010.

(25)

[24℄ S.MallatandZ.Zhang,Mathingpursuitswithtime-frequeny ditionar-ies, IEEE Trans. On Signal Proessing, vol. 41, no. 12, pp. 33973415, De.1993.

[25℄ M.D. Plumbley,T.Blumensath,L.Daudet,R.Gribonval,andM.Davies, Sparserepresentationsinaudioandmusi: fromodingtosoure separa-tion, Pro.of the IEEE,vol.98,no.6,2010.

[26℄ M. Aharon,M. Elad, and A. Brukstein, K-SVD:An Algorithmfor De-signingOverompleteDitionariesforSparseRepresentation,IEEETrans. OnSignalProessing,vol.54,no.11,pp.43114322,Nov.2006.

[27℄ J.Mairal, M.Elad, and G.Sapiro,Sparserepresentationforolor image restoration, IEEE Trans.onImage Proessing,vol.17, no.1,pp.5369, 2008.

[28℄ D. GrinandJ.Lim, Signalestimationfrommodiedshort-timefourier transform, IEEE Trans. Aoustis, Speeh and Sig. Pro., vol.32, no. 2, pp.236243,Apr.1984.

[29℄ M.DaviesandL.Daudet,SparseaudiorepresentationsusingtheMCLT, SignalProessing,vol.86,no.3,pp.457470,2006.

[30℄ E. Ravelli, G. Rihard, and L. Daudet, Union of mdt bases for audio oding, IEEETrans.onAudio,Speeh,andLanguageProessing,vol.16, no.8,pp.13611372,nov.2008.

[31℄ B.Natarajan, Sparseapproximate solutionsto linearsystems, SIAMJ. Computing,vol.25,no.2,pp.227234,1995.

[32℄ G.Davis,S.Mallat,andM.Avellaneda,Adaptivegreedyapproximations, Constr.Approx.,vol.13,no.1,pp.5798,1997.

[33℄ Y. Pati, R. Rezaiifar, and P. Krishnaprasad, Orthogonal mathing pur-suit: reursivefuntionapproximationwithappliationstowavelet deom-position,inPro.Asilomar Conf.Signals, Systems,andComputers,Nov. 1993,pp.4044vol.1.

[34℄ R.Gribonval,Fastmathingpursuitwithamultisaleditionaryof gaus-sian hirps, IEEE Trans. on Signal Proessing, vol. 49, no. 5, pp. 994 1001,May2001.

[35℄ L.Peotta andP.Vandergheynst,Mathing pursuitwithblokinoherent ditionaries, Signal Proessing, IEEE Transations on,vol.55,no.9,pp. 45494557,2007.

[36℄ R.GribonvalandE.Bary,Harmonideompositionofaudiosignalswith mathingpursuit, IEEE Trans. SignalProessing,vol.51, no.1,pp.101 111,2003.

[37℄ Y. Eldar and H. Bolskei, Blok-sparsity: Coherene and eient reov-ery,inIEEEInt.Conf.onAoustis,SpeehandSignalProessing,Taipei, Taiwan,2009,pp.28852888.

(26)

[38℄ H.Mansour,R.Saab,P.Nasiopoulos,andR.Ward,Colorimage desatura-tionusingsparsereonstrution,in IEEEInt.Conf.onAoustis,Speeh andSignalProessing,Dallas,TX,USA,Mar.2010.

[39℄ Methodfor objetive measurementsof pereived audio quality,ITU-R Std. BS.1387,De.1998.

[40℄ R. Huberand B. Kollmeier, Pemo-q -anew method for objetiveaudio qualityassessmentusingamodelofauditorypereption,IEEE Trans.on Audio, Speeh, and Language Proessing, vol. 14, no. 6, pp. 1902 1911, 2006.

[41℄ V. Emiya, E. Vinent, N. Harlander, and V. Hohmann, Subjetive and objetivequality assessmentofaudiosoure separation, IEEE Trans. on Audio,Speeh,andLanguage Proessing,vol.inpress,2011.

[42℄ E. Vinent, S. Araki, and P. Boll, The 2008 signal separation ev alua-tion ampaign: A ommunity-based approah to large-saleevaluation. Paraty,Brazil: Springer,Mar.2009.

[43℄ D. Donoho, Y. Tsaig, I.Drori, and J. Stark, SparseSolution of Under-determinedLinearEquationsbyStagewiseOrthogonalMathingPursuit, preprint,2006.

(27)

Contents

1 Introdution 3

2 AudioInpaintingProblemStatement 5

2.1 Formulationofaudioinpainting . . . 5

2.2 Inpaintingsamplesdistortedbyimpulsivenoise . . . 5

2.3 Inpaintingintervalsofmissingsamples . . . 6

3 Time-domainframeworkand models 6 3.1 Frame-basedproessingand reonstrution . . . 6

3.2 SparseRepresentationsmodelingof audioframes . . . 7

3.3 Ditionaries . . . 7

3.3.1 DisreteCosineTransform(DCT) ditionary . . . 8

3.3.2 Gaborditionary . . . 8

4 Audio inpainting algorithms based on Orthogonal Mathing Pursuit 8 4.1 OrthogonalMathingPursuit (OMP)algorithmforinpainting. . 9

4.2 Atomseletion . . . 10

4.3 Solutionupdate . . . 10

4.4 Algorithmienhanementsforinpaintinglippedsignals . . . 11

4.4.1 The`min'delippingonstraint . . . 11

4.4.2 The`max'delippingonstraint. . . 11

4.4.3 The`minmax'onstrainedSRproblem . . . 11

4.4.4 OMPdelipping algorithm . . . 12

5 ExperimentalResults 12 5.1 Performanemeasures . . . 13

5.2 Theolletionoftestedsignals . . . 13

5.3 Parametersettings . . . 14

5.4 Inpaintingexperiments. . . 14

5.4.1 Globaleetof thedurationofmissingintervals . . . 14

5.4.2 Fineeetofthe`topology'ofthemissingsamples . . . . 15

5.5 Delippingexperiment . . . 17

(28)

IRISA, Campus universitaire de Beaulieu - 35042 Rennes Cedex (France)

Centre de recherche INRIA Bordeaux – Sud Ouest : Domaine Universitaire - 351, cours de la Libération - 33405 Talence Cedex

Centre de recherche INRIA Grenoble – Rhône-Alpes : 655, avenue de l’Europe - 38334 Montbonnot Saint-Ismier

Centre de recherche INRIA Lille – Nord Europe : Parc Scientifique de la Haute Borne - 40, avenue Halley - 59650 Villeneuve d’Ascq

Centre de recherche INRIA Nancy – Grand Est : LORIA, Technopôle de Nancy-Brabois - Campus scientifique

615, rue du Jardin Botanique - BP 101 - 54602 Villers-lès-Nancy Cedex

Centre de recherche INRIA Paris – Rocquencourt : Domaine de Voluceau - Rocquencourt - BP 105 - 78153 Le Chesnay Cedex

Centre de recherche INRIA Saclay – Île-de-France : Parc Orsay Université - ZAC des Vignes : 4, rue Jacques Monod - 91893 Orsay Cedex

Centre de recherche INRIA Sophia Antipolis – Méditerranée : 2004, route des Lucioles - BP 93 - 06902 Sophia Antipolis Cedex

Éditeur

INRIA - Domaine de Voluceau - Rocquencourt, BP 105 - 78153 Le Chesnay Cedex (France)

Gambar

Figure 1: Examples of restoration problems related to inpainting.
Table 2: Summary of the proposed algorithms: ea
h row indi
ates the algorithm
Figure 2: Performan
e of inpainting algorithms as a fun
tion of the duration
Figure 3: Performan
e of the OMP-G algorithm, on the Spee
h8kHz dataset,
+3

Referensi

Dokumen terkait

Data yang terkumpul baik data primer maupun sekunder diolah sesuai dengan kebutuhan baik secara manual maupun secara komputerisasi, kemudian dianalisis dengan cara

Apakah instrumen evaluasi yang lain tidak perlu dipergunakan, seperti pengamatan (observasi), uji keterampilan. Sepertinya tahapan ini dalam menentukan kebijakan UN

Memberikan penjelasan tentang pola asuh yang paling sesuai untuk anak usia dini dan kaitannya dengan pengalaman sebagai pendidik anak

Untuk melihat kesiapan pemerintah dalam mengembangan sistem e-Government, maka telah dipilih beberapa situs atau portal web pemerintah daerah (Pemda) provinsi atau

[r]

Kebijakan dividen adalah keputusan perusahaan mengenai pembagian laba yang diperoleh kepada para pemegang saham sebagai dividen atau ditahan dalam bentuk laba atau retained

4.2.2 Gambaran Perilaku Kewirausahaan Pengusaha Mochi Berdasarkan Jenis Kelamin, Usia, Latar Belakang Pendidikan Formal dan.. Pengalaman

Pemodelan Sistem Informasi Berorientasi Objek dengan UML.. Graha