Fourier Transform, Short-Time Fourier 2 Transform, and Wavelet Transform

(1)

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/328998983

Fourier Transform, Short-Time Fourier Transform, and Wavelet Transform

Chapter · January 2019

DOI: 10.1007/978-3-319-96821-6_4

CITATIONS

8

READS

1,633

3 authors, including:

Komal Rajendrakumar Borisagar Atmiya Institute Of Technology & Science 78PUBLICATIONS 501CITATIONS

SEE PROFILE

Dr. Rohit M. Thanki KRiAN GmbH and Detaction AI 192PUBLICATIONS 1,064CITATIONS

SEE PROFILE

All content following this page was uploaded by Dr. Rohit M. Thanki on 27 June 2021.

The user has requested enhancement of the downloaded file.

(2)

Chapter Title Fourier Transform, Short-Time Fourier Transform, and Wavelet Transform Copyright Year 2019

Copyright Holder Springer Nature Switzerland AG Corresponding Author Family Name Borisagar

Particle

Given Name Komal R.

Sufﬁx

Division E. C. Department

Organization Atmiya Institute of Technology and Science Address Rajkot, Gujarat, India

Email [email protected]

Author Family Name Thanki

Particle

Given Name Rohit M.

Sufﬁx

Organization C. U. Shah University

Address Wadhwan City, Gujarat, India

Author Family Name Sedani

Particle

Given Name Bhavin S.

Sufﬁx

Division E. C. Department

Organization L. D. Engineering College Address Ahmedabad, India

Abstract This chapter presents information about the Fourier transform (FT), short-time Fourier transform (STFT), and wavelet transform. This chapter also covers use of this transform in speech signal processing.

Keywords

(separated by ‘-’)

Signal transformation - Fourier transform (FT) - Short-time Fourier

transform (STFT) - Wavelet

(3)

Chapter 4

1

Fourier Transform, Short-Time Fourier

2

Transform, and Wavelet Transform

3

4.1 Fourier Transform (FT)

4

The Fourier transform (FT) transforms a time domain speech signal into its

5

corresponding frequency domain. The FT of a speech signal can be calculated by

6

using Equation4.1[1,2]. The FT gives complex value coefﬁcients of the speech

7

signal.

S kð Þ ¼X^N1

n¼0

S nð Þ e^2j^π^nk, k¼0, 1,. . .,N1 ð4:1Þ

8

whereS(n) is the input speech signal in the time domain andS(k) is the transformed

9

speech signal in the frequency domain.

10

According to the literature [3], the FT has some limitations when it is applied to a

11

speech signal:

• FT cannot provide simultaneous time and frequency localization. 12

• FT is not very useful for analyzing time-variant, nonstationary signals. 13

• FT is not appropriate for representing discontinuities in the signals. 14

4.2 Short-Time FT

15

16

The short-time FT (STFT) segments the speech signal into narrow time intervals and

17

takes the FT of each segment. Then, each FT provides the spectral information of a

18

segmented speech signal, providing simultaneous time and frequency information

19

[3]. The steps for applying STFT on speech signal are given as follows:

• Choose a window function ofﬁnite length. 20

• Place the window on top of the signal att¼0. 21

©Springer Nature Switzerland AG 2019

K. R. Borisagar et al.,Speech Enhancement Techniques for Digital Hearing Aids, https://doi.org/10.1007/978-3-319-96821-6_4

63

(4)

22 • Truncate the signal using this window.

23 • Compute the FT of the truncated signal; save results.

24 • Incrementally slide the window to the right.

25 • Go to step 3 and repeat the process until the window reaches the end of the signal.

26 The STFT of the speech signal can be calculated by using Equation4.2[3].

STFT_f^uðt⁰;uÞ ¼ ð

t

f tð Þ W tð t⁰Þ

½ e^j2^π^utdt ð4:2Þ

27 wheretis a time parameter of signal,uis a frequency parameter of the signal,f(t) is

28 an input signal, andWis a windowing function.

29 In STFT, windowing function has an important role. Window function should be

30 narrow enough so that the signal portion can fall within the stationary window. But,

31 narrow window function does not give a good localization of the signal in the

32 frequency domain. If window function is inﬁnitely long then STFT turns into FT

33 and provides good frequency localization but does not provide time localization. If

34 window function is inﬁnitely short then STFT provides good time localization but

35 not frequency localization. Thus, this is one of the limitations of STFT when it is

36 applied on a speech signal. This limitation of STFT is overcome by wavelet

37 transform.

38

4.3 Wavelet Transform (WT)

39 Analysis of wavelet transform uses small waves and functions recognized as wave-

40 lets. We can describe wavelet more accurately as a local wavelike function. Some of

41 the most commonly used examples of wavelets are shown in Fig.4.1. Any signal can

42 be transformed from one representation to the other representation wherein we can

43 ﬁnd more useful information using wavelets. The process is known as wavelet

44 transform. If we represent the wavelet transform mathematically, it can be described

45 as the convolution between the wavelet function and the signal under observations.

Fig. 4.1 Some basic wavelets

(5)

46

Different wavelets can be interpreted in alternate ways: a wavelet can be stimu-

47

lated to different places on the exposed signal and it can be expanded or compressed

48

per requirements. Wavelet transform measures the matching of the signal on a local

49

basis with the wavelets. The schematic of the same is represented in Fig.4.2. As

50

shown in Fig.4.3, whenever the shape of the signal is matched with the wavelet at a

51

speciﬁc scale and location, one can observe a large value of the transform.

52

On the other hand, if the matching value is small, a low value of the transform is

53

obtained. Then, as shown by a black dot in Fig.4.4, the transform value is located on

54

the plane of the two-dimensional transform. The task of computing the transform for various scales of the wavelets and at various places of the signals is done by what is 55

known as continuous wavelet transform (CWT) for continuous signals or in discrete 56

steps for the discrete wavelet transform [4]. 57

Wavelet transform, which translates a signal into another form suitable to easily 58

analyze certain parameters of the original signal, allows a picture to ﬁnd the 59 60

correlation between the signal and wavelet at different scales and locations. For

61

computing a wavelet transform, a small wavelike function is required, which as the

62

name says, is a limited to a small area waveform. A wavelet is a function that satisﬁes

63

some mathematical criteria. These functions are manipulated through a process of

64

translation and dilation to change the signal into a different form that‘unfolds’the

65

wave in actual timing and scaling. The signal to be analyzed is a temporal signal like

66

some small mathematical function that varies with time, such as the velocity of a ﬂuid, engine-caused vibration data, or an ECG signal. Presently, the independent 67

68

variable is space not time; still, analysis is done in the same way. That form of a

69

minor wave can be seen on Fig.4.5, and it is speciﬁcally localized on the time axis.

There are huge types of wavelets to select from those availability for data analysis. 70

The better solution for the speciﬁed applications generally depends on the nature of 71

the signal and what is required from the analysis [4]. 72 Fig. 4.2 Location of wavelet

Fig. 4.3 Scale of wavelet

4.3 Wavelet Transform (WT) 65

(6)

Fig. 4.4 Wavelet function, speech signal, and transform

Fig. 4.5 Four wavelets (a) Gaussian wave (b) Mexican Hat (c) Haar (d) Morlet

(7)

The Mexican hat wavelet is one of the important wavelet functions. It covers 73 74

many properties of continuous wavelet transform (CWT) analysis. The deﬁnition of

75

a Mexican hat wavelet is

Ψð Þ ¼t 1t²

e^t²² ð4:3Þ

76

The wavelet function represented by the foregoing iteration is recognized as the

77

mother wavelet. The basic requirements of any function as a wavelet function are

78

given as

79

1. In general, wavelet transform should include ﬁnite energy for different applications: 80

E¼ ð₁

1jΨð Þt j²dt<1 ð4:4Þ In the mentioned iteration, energy is deﬁned as an integral of the squared 81

magnitude of Ψ(t) over the inﬁnite duration of time. For complex Ψ(t), one 82

need to ﬁnd the energy considering both magnitude as well as phasor part of 83

Ψ(t) [5]. 84

2. IfΨbð Þ ¼f 85

ð₁

1jΨð Þt jeⁱ^ð²^π^f^Þtdtis the Fourier transform ofΨ(t), then the following condition must hold: 86

Cg¼ ð₁

1

Ψbð Þf ²

f df <1 ð4:5Þ

The foregoing iteration shows that the wavelet transform is basically not with 87

Ψ(0)¼0, that is, the zero frequency component, or, we can also say that the 88 89

waveletΨ(t) should include a zero mean. This iteration can be recognized with an

90

acceptability situation and C_g, which is known as the acceptability constant

91

factor. By means of the selected wavelet, the value ofC_gcan be decided.

4.4 Comparison of the Wavelet Transform (WT) with FT

92

and STFT

93

94

Wavelet analysis is actually used to compare several magniﬁcations of signals with distinct resolution. The Fourier analysis can be done using basic building blocks, 95

also known as time–frequency atoms, namely, sine and cosine waves. There are two 96

different kinds of wavelets: namely, mother wavelet and child wavelets. The mother 97 4.4 Comparison of the Wavelet Transform (WT) with FT and STFT 67

(8)

98 wavelet oscillates and is translated and dilated by some translations and dilations so

99 as to generate child wavelets. These two are used as building blocks of the wavelet

100 analysis.

101 The Fourier series is useful only with periodic signals. The Fourier transform can

102 be used for frequency analysis of nonperiodic general functions. The Fourier trans-

103 form is used for the analysis of the time domain signal in the frequency domain.

104 Transform is carried out in three steps. First, the signal is transformed from one

105 domain to another domain, that is, time domain to the frequency domain. In this

106 process, the coefﬁcients of the frequency domain are modiﬁed with reference to the

107 requirement, and last, the effect of the modiﬁcation can be viewed in the time

108 domain by applying the inverse transform, which converts the frequency domain

109 signal back into the time domain. Here the Fourier coefﬁcients would represent the

110 contributions of cosine and sine functions having different frequencies. FT of the

111 signalf(t) can be given by

Fð Þ ¼ω ð

Fð Þeω ⁱ^ω^tdt ð4:6Þ

112 The inverse Fourier transform performs a reverse action in which it converts data

113 from the frequency domain to the time domain. Inverse FT is represented by

f tð Þ ¼1=2π ð

Fð Þeω ⁱ^ω^tdt ð4:7Þ

114 The analysis coefﬁcientsF(ω), which deﬁne the notation of global frequencyfin

115 the signal, are calculated as multiplicative products of the signal with sine wave

116 fundamental function of unbounded duration. Hence, the foregoing Fourier trans-

117 form works well with a signal having few stationary components. Unpredictable

118 changes with reference to time in nonstationary signals, such asf(t), expand out over

119 the entire frequency in F(ω). To overcome this problem, the short-time Fourier

120 transform (STFT) can be implemented [6].

121 Frequency dependency on time can be obtained by the value of instantaneous

122 frequency. If the signal is broadband, the value of the instantaneous frequency just

123 performs the averaging of different values of spectral components in time. A

124 two-dimensional time–frequency representation is required to deﬁne and observe

125 the dependency of spectral characteristics in time.

126 Now consider a stationary signal f(t) through window functiong(t) of limited

127 time, which is speciﬁcally centered at locationτ. Now, in that case, the STFT of the

128 signal can be deﬁned as

STFTðτ;fÞ ¼ ð₁

1f tð Þg^∗ðtτÞeⁱ^ω^tdt ð4:8Þ

(9)

129

The function of STFT is to map the signal into a two-dimensional function to be

130

considered in a time–frequency plane. his performance analysis critically depends on

131

window function g(t). The diagram of Fig. 4.6 shows a time–frequency plane corresponding to STFT. Thr vertical stripe shows windowing in the time domain. 132

The frequencies of STFT can be computed around the window at time t. Here 133

another view is given that is based on ﬁlter bank understanding of the similar 134

procedure. The capacity of STFT to differentiate two pure sinusoids is better. 135

Here, the windowing function, that is,g(t), is given: FT,G(ω). Bandwidth Δf of 136 137

theﬁlter can be recognized as Δf²¼

ð

f²jGð Þω j²df

= ð

Gð Þω j j²df

ð4:9Þ

where the denominatorÐ 138

|G(ω)|²dfrepresents energy of the signalg(t). If the separation between two signals is more thanΔf, which is known as frequency resolution 139

140

of the analytical process of STFT, then and only then can they be differentiated.

Likewise, the spread time is represented byΔtwhere the denominatorÐ 141

|g(t)|²dt

142

shows the energy ofg(t). If the separation between the two signals is more thanΔt,

143

which is known as time resolution of the synthesis process of STFT, then and only

144

then can they be differentiated. Now, because the product value between the

145

resolution in time and frequency domain is lower, bounded by the Heisenberg

146

uncertainty principle as given below, the resolution cannot be arbitrarily small.

Time-bandwidth product¼ΔtΔf >1=4π ð4:10Þ Thus, the possibilities are trading of frequency resolution for time resolution or 147

vice versa. As Gaussian windows are normally met above bounds with equality, 148 149

these are most widely used. In case of STFT, if any time a window has been selected

150

then the time–frequency resolution can be keptﬁxed over the whole time–frequency

151

plane because a similar window is utilized for all frequency variations.

Time f2

Frequency

Modulated filter bank

t

Fig. 4.6 Time–frequency plane for STFT

4.4 Comparison of the Wavelet Transform (WT) with FT and STFT 69

(10)

152 For a nonperiodic signal f(t), the signal cannot be accurately represented by

153 addition of cosine and sine-like periodic functions. The only solution is to artiﬁcially

154 extend the signal to make it periodic, but even that requires additional endpoint

155 continuity. Windowed Fourier transform (WFT) is another solution that gives time

156 and frequency domain information simultaneously. For this particular purpose, the

157 signal isﬁrst divided into small parts that are individually giving WFT for separate

158 analysis of frequency signals. For short transitions in the signal, windowing can be

159 applied onto the signal in such a manner as to converge sections to zero at the

160 endpoint; this is accomplished by a weight function with more emphasis in the

161 middle than at the endpoint. The time domain signal is localized through the effect of

162 the window. The DFT and DWT are linear transforms that are responsible for

163 generation of a data structure containing segments with various lengths. The math-

164 ematical properties of matrices involved are similar. Both DFT and DWT in different

165 domains are looked on as rotation functions. This new domain for FFT would

166 contain cosine and sine basis functions and the mother wavelet for the wavelet

167 transform. The key difference between both these transforms, that is, Fourier trans-

168 forms and wavelet transform, is that in FT the sine and cosine functions are not

169 localized in the space, whereas in WT, each and every individual wavelet function is

170 localized in space (Fig.4.7).

171 Because its sparseness features, the wavelet can be used for applications such as

172 data compression, noise removing, and image processing. The time–frequency

173 resolution is another major difference between FT and WT. Figure4.8shows the

174 performance analysis of short-time Fourier transform and wavelet transform for

175 time–frequency resolution.

176 The STFT is not utilized to analyze real-time signals, which have low-frequency

177 signals along with high-frequency content and the frequency variations with respect

178 to time. To overcome the time and frequency resolution limit of STFT, the wavelet

179 analysis can be used to allow varying the resolution ofΔtandΔf(Fig.4.8) for the

180 time–frequency plane. Thus, multi-resolution analysis can be achieved with wavelet

181 transform. When looking at wavelet analysis from the aspect of theﬁlter bank, it can

182 be said that one has to vary time resolution with respect to variation in the central

183 frequency. In wavelet manipulation the frequency span, Δf, is directly varied to

184 central frequency,f₀

Fig. 4.7 Time–frequency plane for Fourier basis functions

(11)

Δf=f₀¼Constant ð4:11Þ

185

With the help of a bandpassﬁlter with constantQ, the collection of the wavelet

186

representationﬁlter bank is carried out. In this case, time resolution also changes

187

with change in middle frequency. This step will gratify the Heisenberg uncertainty

188

principle, but now at high frequencies time resolution becomes randomly better,

189

whereas on the other hand frequency resolution becomes randomly good at low

190

frequencies. Thus, wavelet analysis offers time and frequency selectivity. So, to

191

increase the time resolution a wavelet can be used wherein separation of two short

192

bursts is accomplished by selecting higher analysis frequencies . Thus, one should

193

use wavelet analysis when the signal contains high-frequency parts having very

194

short duration and low-frequency parts for a long duration. In the wavelet transform,

195

the size of the windows is not constant: it varies [6].

4.5 Multiresolution Algorithm

196

197

The wavelet transform is also referred as the multi-resolution algorithm [4]. If it is

198

desired to compute the approximation values ofS_m,nand discrete wavelet transform

199

T_m,nfor the input signalS_0,nusing the decomposition algorithm, oneﬁrst computes

200

T_1,nandS_1,nfrom the input coefﬁcients speciﬁed byS_0,nas follows:

s1,n¼ 1 ffiffiffi2 p X

k

ckS0, 2nþk T1,n¼ 1

ffiffiffi2 p X

k

bkS0, 2nþk

ð4:12Þ

201

In the same way,S_2,nandT_2,ncan be calculated usingS_1,n: Fig. 4.8 Time–frequency plane for basis function of wavelet analysis

4.5 Multiresolution Algorithm 71

(12)

s₂_,_n¼ 1 ffiffiffi2 p X

k

ckS₁, 2nþk T2,n¼ 1

ffiffiffi2 p X

k

bkS1, 2nþk

ð4:13Þ

202 Next, from approximation coefﬁcientsS_2,n, one canﬁndS_3,nandT_3,nand so on up

203 to scale indicesM, where one will be able to computeS_M,0andT_M,0. The decom-

204 position of the discrete input signal at scale indexMwith some array coefﬁcients and

205 a single value S_m,0 corresponds to a ¼ 2m and b ¼ 2 mn location with length

206 N¼2M. Hence, it speciﬁes the value ofmandnfor important coefﬁcients, which are

207 mainly speciﬁed by 1 <m>Mand 0 <n< 2Mm1.

208 Multi-resolution analysis (MRA) decomposes the vector spaceL²(R) in a set of

209 subspaces, represented as

. . .V₂V₁V₀V₁ V₂. . .

V_jV_jþ1 for 8j2Z ð4:14Þ

210 In this case, the union of this subspace is closure toL²(R), that is, \

j2ZVj¼f g.0

211 The intersection of subspaces is a set containing zero vector. Consider thatx¼[x₁,x₂

212 . . .x_N] forms a linear vector space of linearly independent basis vectors having

213 Ndimensions and given asa₁,a₂,a₃. . .a_N, and a set ofN-dimensional real-valued

214 vectors. This vector space may be represented as a linear combinations of all the

215 foregoing basis vectors. ThisN-dimensional vector space isV_N. Next, approximation

216 vectors can be considered in thisN-dimensional space by vectors in a subspace of

217 lower dimension, say,N2. Suppose all linear combinations can be generated of

218 justN1 basis vectors, saya1,a2. . .a_N1, then this will form a vector spaceV_N1,

219 which is a subspace of V_N.

220 Similarly, by dropping the last basis vector at every step, subspacesV_N2with

221 dimensionN2,V_N3can be constructed with dimensionN3 and so on, up toV₁

222 with dimension 1. The subspaceV₁has a single basis vector,a₂. These vector spaces

223 form a nested sequence of subspaces,V₁V2V3. . .V_N1VN. Now, it is required

224 to approximate a vector x inV_N by a vectorin V_N1. At the same time it is also

225 required to reduce error between the original vectorxand the new vector (let us call it

226 x_N1) in the spaceV_N1. The only way to reduce error is to minimize the length of

227 error vectore_N1, wheree_N1is given bye_N1¼xx_N1, which can be obtained

228 by

e_N1;ak

h i ¼0, k¼0, 1, 2,. . .,N1 ð4:15Þ

229 As shown in Fig.4.9,x_N1is the orthogonal projection ofxon vector spaceV_N1.

230 If this process of projecting throughout the entire sequences of subspaces con-

231 tinues, x_N1, x_N2, . . . and so on can be computed. These results will go into

232 sequences of error vectors:e_N1,e_N2,e_N3,. . .e₁. This error vector represents the

(13)

233

amount of detail lost to the subsequent approximation. In this process, vectorxis

234

represented by various levels of resolution in different spaces [7]. The difference

235

between subspaces V_N and V_N1can be given by another subspace. Assume this

236

subspace asW_N1. It can be said that the last part of vectorxmay be in this subspace.

237

So,W_N1contains detail components. The sequences of error vectorse_N1,e_N2,

238

e_N3,. . .e₁form an orthogonal set belonging to the one-dimensional space ofW_N1,

239

W_N2,. . .W₁. Mathematically, this can be represented asV_N¼V_N1W_N1, and

240

V_N1is equal toV_N2W_N2and so on (Fig.4.10). If this process is extended to

241

infinity, then thefinal average will be zero and the signal is decomposed into all detail coefficients. Then 242

L²ð Þ ¼ R ¹

j¼1Wj ð4:16Þ

VN ¼V_N1W_N1¼V_N2W_N2W_N1¼

VN ¼V0W0W1 W_N2W_N1 ð4:17Þ

243

where subspaceV₀contains the last average orﬁnal approximation components, and

244

subspacesW₀,W₁,W₂,. . .W_N2, andW_N1contain detail vectors or error vectors. It

245

can be seen that the original vectorxwas constructed from theﬁnal approximation

246

vector and detail vector:

x¼x₁þe₁þe₂þe₃þ þe_N1 ð4:18Þ Multi-resolution analysis involves approximation of the functions in a sequence 247

248

of nested linear vector spaces (Fig.4.10).

x e

xN–1

VN–1

Fig. 4.9 Vector diagram

W_N-1

V_N-1 V_N

V_N-2

W_N-2

W_N-3 Fig. 4.10 Vector space

4.5 Multiresolution Algorithm 73

(14)

249 If the space can have some function,x(t)2V₀, thenx(2t)2V₁andx(t/2)2V₁.

250 This property says that the dilated function with dilation factor two belongs to the

251 next coarser subspace and dilation factor one-half belongs to the nextﬁner space.

252 There exists a function known as the scaling functionϕ(t) such thatϕ(tk) is the

253 basis forV₀. Translation and dilation of this basis function can represent approxi-

254 mation of any functionf(t) [8]. Figure4.11shows the space and resolution level.

255

References

256 1. Dhar, P. K., & Shimamura, T. (2015).Advances in audio watermarking based on singular value 257 decomposition. Cham: Springer.

258 2. Thanki, R., Borisagar, K., & Borra, S. (2018).Advance compression and watermarking tech- 259 nique for speech signals. Cham: Springer.

260 3. Bebis, G. (2001).Short time Fourier transform (STFT). Image processing fundamentals. CS474/

261 674.

262 4. Resnikoff, H. L., & Raymond Jr., O. (2012). Wavelet analysis: the scalable structure of 263 information. Cham: Springer.

264 5. Leisenberg M (1995). Hearing aids for the profoundly deaf based on neural net speech 265 processing. InICASSP-95, 1995 I.E. International Conference on Acoustics, Speech, and Signal 266 Processing(Vol 5, pp. 3535–3538). Piscataway: IEEE.

267 6. Agbinya, J. I. (1996). Discrete wavelet transform techniques in speech processing. InTENCON 268 ’96. Proceedings, 1996 I.E. TENCON. Digital Signal Processing Applications (Vol. 2, pp.

269 514–519). Piscataway: IEEE.

270 7. Xueying, Z., & Zhiping, J. (2004). Speech recognition based on auditory wavelet packetﬁlter. In 271 Proceedings of 7th International Conference on Signal Processing, 2004. ICSP’04(Vol 1, pp.

272 695–698).

273 8. Agbinya, J. I. (1996). Discrete wavelet transform techniques in speech processing. In 274 TENCON’96. Proceedings, 1996 I.E. TENCON. Digital Signal Processing Applications(Vol.

275 2, pp. 514–519).

x(t)

V1

V0

V–1

x(2t)

x(t/2) Fig. 4.11 Space and

resolution level

View publication stats