See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/328998983
Fourier Transform, Short-Time Fourier Transform, and Wavelet Transform
Chapter · January 2019
DOI: 10.1007/978-3-319-96821-6_4
CITATIONS
8
READS
1,633
3 authors, including:
Komal Rajendrakumar Borisagar Atmiya Institute Of Technology & Science 78PUBLICATIONS 501CITATIONS
SEE PROFILE
Dr. Rohit M. Thanki KRiAN GmbH and Detaction AI 192PUBLICATIONS 1,064CITATIONS
SEE PROFILE
All content following this page was uploaded by Dr. Rohit M. Thanki on 27 June 2021.
The user has requested enhancement of the downloaded file.
Chapter Title Fourier Transform, Short-Time Fourier Transform, and Wavelet Transform Copyright Year 2019
Copyright Holder Springer Nature Switzerland AG Corresponding Author Family Name Borisagar
Particle
Given Name Komal R.
Suffix
Division E. C. Department
Organization Atmiya Institute of Technology and Science Address Rajkot, Gujarat, India
Email [email protected]
Author Family Name Thanki
Particle
Given Name Rohit M.
Suffix
Organization C. U. Shah University
Address Wadhwan City, Gujarat, India
Author Family Name Sedani
Particle
Given Name Bhavin S.
Suffix
Division E. C. Department
Organization L. D. Engineering College Address Ahmedabad, India
Abstract This chapter presents information about the Fourier transform (FT), short-time Fourier transform (STFT), and wavelet transform. This chapter also covers use of this transform in speech signal processing.
Keywords
(separated by ‘-’)
Signal transformation - Fourier transform (FT) - Short-time Fourier
transform (STFT) - Wavelet
Chapter 4
1Fourier Transform, Short-Time Fourier
2Transform, and Wavelet Transform
34.1 Fourier Transform (FT)
4
The Fourier transform (FT) transforms a time domain speech signal into its
5
corresponding frequency domain. The FT of a speech signal can be calculated by
6
using Equation4.1[1,2]. The FT gives complex value coefficients of the speech
7
signal.
S kð Þ ¼XN1
n¼0
S nð Þ e2jπnk, k¼0, 1,. . .,N1 ð4:1Þ
8
whereS(n) is the input speech signal in the time domain andS(k) is the transformed
9
speech signal in the frequency domain.
10
According to the literature [3], the FT has some limitations when it is applied to a
11
speech signal:
• FT cannot provide simultaneous time and frequency localization. 12
• FT is not very useful for analyzing time-variant, nonstationary signals. 13
• FT is not appropriate for representing discontinuities in the signals. 14
4.2 Short-Time FT
1516
The short-time FT (STFT) segments the speech signal into narrow time intervals and
17
takes the FT of each segment. Then, each FT provides the spectral information of a
18
segmented speech signal, providing simultaneous time and frequency information
19
[3]. The steps for applying STFT on speech signal are given as follows:
• Choose a window function offinite length. 20
• Place the window on top of the signal att¼0. 21
©Springer Nature Switzerland AG 2019
K. R. Borisagar et al.,Speech Enhancement Techniques for Digital Hearing Aids, https://doi.org/10.1007/978-3-319-96821-6_4
63
22 • Truncate the signal using this window.
23 • Compute the FT of the truncated signal; save results.
24 • Incrementally slide the window to the right.
25 • Go to step 3 and repeat the process until the window reaches the end of the signal.
26 The STFT of the speech signal can be calculated by using Equation4.2[3].
STFTfuðt0;uÞ ¼ ð
t
f tð Þ W tð t0Þ
½ ej2πutdt ð4:2Þ
27 wheretis a time parameter of signal,uis a frequency parameter of the signal,f(t) is
28 an input signal, andWis a windowing function.
29 In STFT, windowing function has an important role. Window function should be
30 narrow enough so that the signal portion can fall within the stationary window. But,
31 narrow window function does not give a good localization of the signal in the
32 frequency domain. If window function is infinitely long then STFT turns into FT
33 and provides good frequency localization but does not provide time localization. If
34 window function is infinitely short then STFT provides good time localization but
35 not frequency localization. Thus, this is one of the limitations of STFT when it is
36 applied on a speech signal. This limitation of STFT is overcome by wavelet
37 transform.
38
4.3 Wavelet Transform (WT)
39 Analysis of wavelet transform uses small waves and functions recognized as wave-
40 lets. We can describe wavelet more accurately as a local wavelike function. Some of
41 the most commonly used examples of wavelets are shown in Fig.4.1. Any signal can
42 be transformed from one representation to the other representation wherein we can
43 find more useful information using wavelets. The process is known as wavelet
44 transform. If we represent the wavelet transform mathematically, it can be described
45 as the convolution between the wavelet function and the signal under observations.
Fig. 4.1 Some basic wavelets
46
Different wavelets can be interpreted in alternate ways: a wavelet can be stimu-
47
lated to different places on the exposed signal and it can be expanded or compressed
48
per requirements. Wavelet transform measures the matching of the signal on a local
49
basis with the wavelets. The schematic of the same is represented in Fig.4.2. As
50
shown in Fig.4.3, whenever the shape of the signal is matched with the wavelet at a
51
specific scale and location, one can observe a large value of the transform.
52
On the other hand, if the matching value is small, a low value of the transform is
53
obtained. Then, as shown by a black dot in Fig.4.4, the transform value is located on
54
the plane of the two-dimensional transform. The task of computing the transform for various scales of the wavelets and at various places of the signals is done by what is 55
known as continuous wavelet transform (CWT) for continuous signals or in discrete 56
steps for the discrete wavelet transform [4]. 57
Wavelet transform, which translates a signal into another form suitable to easily 58
analyze certain parameters of the original signal, allows a picture to find the 59 60
correlation between the signal and wavelet at different scales and locations. For
61
computing a wavelet transform, a small wavelike function is required, which as the
62
name says, is a limited to a small area waveform. A wavelet is a function that satisfies
63
some mathematical criteria. These functions are manipulated through a process of
64
translation and dilation to change the signal into a different form that‘unfolds’the
65
wave in actual timing and scaling. The signal to be analyzed is a temporal signal like
66
some small mathematical function that varies with time, such as the velocity of a fluid, engine-caused vibration data, or an ECG signal. Presently, the independent 67
68
variable is space not time; still, analysis is done in the same way. That form of a
69
minor wave can be seen on Fig.4.5, and it is specifically localized on the time axis.
There are huge types of wavelets to select from those availability for data analysis. 70
The better solution for the specified applications generally depends on the nature of 71
the signal and what is required from the analysis [4]. 72 Fig. 4.2 Location of wavelet
Fig. 4.3 Scale of wavelet
4.3 Wavelet Transform (WT) 65
Fig. 4.4 Wavelet function, speech signal, and transform
Fig. 4.5 Four wavelets (a) Gaussian wave (b) Mexican Hat (c) Haar (d) Morlet
The Mexican hat wavelet is one of the important wavelet functions. It covers 73 74
many properties of continuous wavelet transform (CWT) analysis. The definition of
75
a Mexican hat wavelet is
Ψð Þ ¼t 1t2
et22 ð4:3Þ
76
The wavelet function represented by the foregoing iteration is recognized as the
77
mother wavelet. The basic requirements of any function as a wavelet function are
78
given as
79
1. In general, wavelet transform should include finite energy for different applications: 80
E¼ ð1
1jΨð Þt j2dt<1 ð4:4Þ In the mentioned iteration, energy is defined as an integral of the squared 81
magnitude of Ψ(t) over the infinite duration of time. For complex Ψ(t), one 82
need to find the energy considering both magnitude as well as phasor part of 83
Ψ(t) [5]. 84
2. IfΨbð Þ ¼f 85
ð1
1jΨð Þt jeið2πfÞtdtis the Fourier transform ofΨ(t), then the following condition must hold: 86
Cg¼ ð1
1
Ψbð Þf 2
f df <1 ð4:5Þ
The foregoing iteration shows that the wavelet transform is basically not with 87
Ψ(0)¼0, that is, the zero frequency component, or, we can also say that the 88 89
waveletΨ(t) should include a zero mean. This iteration can be recognized with an
90
acceptability situation and Cg, which is known as the acceptability constant
91
factor. By means of the selected wavelet, the value ofCgcan be decided.
4.4 Comparison of the Wavelet Transform (WT) with FT
92and STFT
9394
Wavelet analysis is actually used to compare several magnifications of signals with distinct resolution. The Fourier analysis can be done using basic building blocks, 95
also known as time–frequency atoms, namely, sine and cosine waves. There are two 96
different kinds of wavelets: namely, mother wavelet and child wavelets. The mother 97 4.4 Comparison of the Wavelet Transform (WT) with FT and STFT 67
98 wavelet oscillates and is translated and dilated by some translations and dilations so
99 as to generate child wavelets. These two are used as building blocks of the wavelet
100 analysis.
101 The Fourier series is useful only with periodic signals. The Fourier transform can
102 be used for frequency analysis of nonperiodic general functions. The Fourier trans-
103 form is used for the analysis of the time domain signal in the frequency domain.
104 Transform is carried out in three steps. First, the signal is transformed from one
105 domain to another domain, that is, time domain to the frequency domain. In this
106 process, the coefficients of the frequency domain are modified with reference to the
107 requirement, and last, the effect of the modification can be viewed in the time
108 domain by applying the inverse transform, which converts the frequency domain
109 signal back into the time domain. Here the Fourier coefficients would represent the
110 contributions of cosine and sine functions having different frequencies. FT of the
111 signalf(t) can be given by
Fð Þ ¼ω ð
Fð Þeω iωtdt ð4:6Þ
112 The inverse Fourier transform performs a reverse action in which it converts data
113 from the frequency domain to the time domain. Inverse FT is represented by
f tð Þ ¼1=2π ð
Fð Þeω iωtdt ð4:7Þ
114 The analysis coefficientsF(ω), which define the notation of global frequencyfin
115 the signal, are calculated as multiplicative products of the signal with sine wave
116 fundamental function of unbounded duration. Hence, the foregoing Fourier trans-
117 form works well with a signal having few stationary components. Unpredictable
118 changes with reference to time in nonstationary signals, such asf(t), expand out over
119 the entire frequency in F(ω). To overcome this problem, the short-time Fourier
120 transform (STFT) can be implemented [6].
121 Frequency dependency on time can be obtained by the value of instantaneous
122 frequency. If the signal is broadband, the value of the instantaneous frequency just
123 performs the averaging of different values of spectral components in time. A
124 two-dimensional time–frequency representation is required to define and observe
125 the dependency of spectral characteristics in time.
126 Now consider a stationary signal f(t) through window functiong(t) of limited
127 time, which is specifically centered at locationτ. Now, in that case, the STFT of the
128 signal can be defined as
STFTðτ;fÞ ¼ ð1
1f tð Þg∗ðtτÞeiωtdt ð4:8Þ
129
The function of STFT is to map the signal into a two-dimensional function to be
130
considered in a time–frequency plane. his performance analysis critically depends on
131
window function g(t). The diagram of Fig. 4.6 shows a time–frequency plane corresponding to STFT. Thr vertical stripe shows windowing in the time domain. 132
The frequencies of STFT can be computed around the window at time t. Here 133
another view is given that is based on filter bank understanding of the similar 134
procedure. The capacity of STFT to differentiate two pure sinusoids is better. 135
Here, the windowing function, that is,g(t), is given: FT,G(ω). Bandwidth Δf of 136 137
thefilter can be recognized as Δf2¼
ð
f2jGð Þω j2df
= ð
Gð Þω j j2df
ð4:9Þ
where the denominatorÐ 138
|G(ω)|2dfrepresents energy of the signalg(t). If the sepa- ration between two signals is more thanΔf, which is known as frequency resolution 139
140
of the analytical process of STFT, then and only then can they be differentiated.
Likewise, the spread time is represented byΔtwhere the denominatorÐ 141
|g(t)|2dt
142
shows the energy ofg(t). If the separation between the two signals is more thanΔt,
143
which is known as time resolution of the synthesis process of STFT, then and only
144
then can they be differentiated. Now, because the product value between the
145
resolution in time and frequency domain is lower, bounded by the Heisenberg
146
uncertainty principle as given below, the resolution cannot be arbitrarily small.
Time-bandwidth product¼ΔtΔf >1=4π ð4:10Þ Thus, the possibilities are trading of frequency resolution for time resolution or 147
vice versa. As Gaussian windows are normally met above bounds with equality, 148 149
these are most widely used. In case of STFT, if any time a window has been selected
150
then the time–frequency resolution can be keptfixed over the whole time–frequency
151
plane because a similar window is utilized for all frequency variations.
Time f2
Frequency
Modulated filter bank
t
Fig. 4.6 Time–frequency plane for STFT
4.4 Comparison of the Wavelet Transform (WT) with FT and STFT 69
152 For a nonperiodic signal f(t), the signal cannot be accurately represented by
153 addition of cosine and sine-like periodic functions. The only solution is to artificially
154 extend the signal to make it periodic, but even that requires additional endpoint
155 continuity. Windowed Fourier transform (WFT) is another solution that gives time
156 and frequency domain information simultaneously. For this particular purpose, the
157 signal isfirst divided into small parts that are individually giving WFT for separate
158 analysis of frequency signals. For short transitions in the signal, windowing can be
159 applied onto the signal in such a manner as to converge sections to zero at the
160 endpoint; this is accomplished by a weight function with more emphasis in the
161 middle than at the endpoint. The time domain signal is localized through the effect of
162 the window. The DFT and DWT are linear transforms that are responsible for
163 generation of a data structure containing segments with various lengths. The math-
164 ematical properties of matrices involved are similar. Both DFT and DWT in different
165 domains are looked on as rotation functions. This new domain for FFT would
166 contain cosine and sine basis functions and the mother wavelet for the wavelet
167 transform. The key difference between both these transforms, that is, Fourier trans-
168 forms and wavelet transform, is that in FT the sine and cosine functions are not
169 localized in the space, whereas in WT, each and every individual wavelet function is
170 localized in space (Fig.4.7).
171 Because its sparseness features, the wavelet can be used for applications such as
172 data compression, noise removing, and image processing. The time–frequency
173 resolution is another major difference between FT and WT. Figure4.8shows the
174 performance analysis of short-time Fourier transform and wavelet transform for
175 time–frequency resolution.
176 The STFT is not utilized to analyze real-time signals, which have low-frequency
177 signals along with high-frequency content and the frequency variations with respect
178 to time. To overcome the time and frequency resolution limit of STFT, the wavelet
179 analysis can be used to allow varying the resolution ofΔtandΔf(Fig.4.8) for the
180 time–frequency plane. Thus, multi-resolution analysis can be achieved with wavelet
181 transform. When looking at wavelet analysis from the aspect of thefilter bank, it can
182 be said that one has to vary time resolution with respect to variation in the central
183 frequency. In wavelet manipulation the frequency span, Δf, is directly varied to
184 central frequency,f0
Fig. 4.7 Time–frequency plane for Fourier basis functions
Δf=f0¼Constant ð4:11Þ
185
With the help of a bandpassfilter with constantQ, the collection of the wavelet
186
representationfilter bank is carried out. In this case, time resolution also changes
187
with change in middle frequency. This step will gratify the Heisenberg uncertainty
188
principle, but now at high frequencies time resolution becomes randomly better,
189
whereas on the other hand frequency resolution becomes randomly good at low
190
frequencies. Thus, wavelet analysis offers time and frequency selectivity. So, to
191
increase the time resolution a wavelet can be used wherein separation of two short
192
bursts is accomplished by selecting higher analysis frequencies . Thus, one should
193
use wavelet analysis when the signal contains high-frequency parts having very
194
short duration and low-frequency parts for a long duration. In the wavelet transform,
195
the size of the windows is not constant: it varies [6].
4.5 Multiresolution Algorithm
196197
The wavelet transform is also referred as the multi-resolution algorithm [4]. If it is
198
desired to compute the approximation values ofSm,nand discrete wavelet transform
199
Tm,nfor the input signalS0,nusing the decomposition algorithm, onefirst computes
200
T1,nandS1,nfrom the input coefficients specified byS0,nas follows:
s1,n¼ 1 ffiffiffi2 p X
k
ckS0, 2nþk T1,n¼ 1
ffiffiffi2 p X
k
bkS0, 2nþk
ð4:12Þ
201
In the same way,S2,nandT2,ncan be calculated usingS1,n: Fig. 4.8 Time–frequency plane for basis function of wavelet analysis
4.5 Multiresolution Algorithm 71
s2,n¼ 1 ffiffiffi2 p X
k
ckS1, 2nþk T2,n¼ 1
ffiffiffi2 p X
k
bkS1, 2nþk
ð4:13Þ
202 Next, from approximation coefficientsS2,n, one canfindS3,nandT3,nand so on up
203 to scale indicesM, where one will be able to computeSM,0andTM,0. The decom-
204 position of the discrete input signal at scale indexMwith some array coefficients and
205 a single value Sm,0 corresponds to a ¼ 2m and b ¼ 2 mn location with length
206 N¼2M. Hence, it specifies the value ofmandnfor important coefficients, which are
207 mainly specified by 1 <m>Mand 0 <n< 2Mm1.
208 Multi-resolution analysis (MRA) decomposes the vector spaceL2(R) in a set of
209 subspaces, represented as
. . .V2V1V0V1 V2. . .
VjVjþ1 for 8j2Z ð4:14Þ
210 In this case, the union of this subspace is closure toL2(R), that is, \
j2ZVj¼f g.0
211 The intersection of subspaces is a set containing zero vector. Consider thatx¼[x1,x2
212 . . .xN] forms a linear vector space of linearly independent basis vectors having
213 Ndimensions and given asa1,a2,a3. . .aN, and a set ofN-dimensional real-valued
214 vectors. This vector space may be represented as a linear combinations of all the
215 foregoing basis vectors. ThisN-dimensional vector space isVN. Next, approximation
216 vectors can be considered in thisN-dimensional space by vectors in a subspace of
217 lower dimension, say,N2. Suppose all linear combinations can be generated of
218 justN1 basis vectors, saya1,a2. . .aN1, then this will form a vector spaceVN1,
219 which is a subspace of VN.
220 Similarly, by dropping the last basis vector at every step, subspacesVN2with
221 dimensionN2,VN3can be constructed with dimensionN3 and so on, up toV1
222 with dimension 1. The subspaceV1has a single basis vector,a2. These vector spaces
223 form a nested sequence of subspaces,V1V2V3. . .VN1VN. Now, it is required
224 to approximate a vector x inVN by a vectorin VN1. At the same time it is also
225 required to reduce error between the original vectorxand the new vector (let us call it
226 xN1) in the spaceVN1. The only way to reduce error is to minimize the length of
227 error vectoreN1, whereeN1is given byeN1¼xxN1, which can be obtained
228 by
eN1;ak
h i ¼0, k¼0, 1, 2,. . .,N1 ð4:15Þ
229 As shown in Fig.4.9,xN1is the orthogonal projection ofxon vector spaceVN1.
230 If this process of projecting throughout the entire sequences of subspaces con-
231 tinues, xN1, xN2, . . . and so on can be computed. These results will go into
232 sequences of error vectors:eN1,eN2,eN3,. . .e1. This error vector represents the
233
amount of detail lost to the subsequent approximation. In this process, vectorxis
234
represented by various levels of resolution in different spaces [7]. The difference
235
between subspaces VN and VN1can be given by another subspace. Assume this
236
subspace asWN1. It can be said that the last part of vectorxmay be in this subspace.
237
So,WN1contains detail components. The sequences of error vectorseN1,eN2,
238
eN3,. . .e1form an orthogonal set belonging to the one-dimensional space ofWN1,
239
WN2,. . .W1. Mathematically, this can be represented asVN¼VN1WN1, and
240
VN1is equal toVN2WN2and so on (Fig.4.10). If this process is extended to
241
infinity, then thefinal average will be zero and the signal is decomposed into all detail coefficients. Then 242
L2ð Þ ¼ R 1
j¼1Wj ð4:16Þ
VN ¼VN1WN1¼VN2WN2WN1¼
VN ¼V0W0W1 WN2WN1 ð4:17Þ
243
where subspaceV0contains the last average orfinal approximation components, and
244
subspacesW0,W1,W2,. . .WN2, andWN1contain detail vectors or error vectors. It
245
can be seen that the original vectorxwas constructed from thefinal approximation
246
vector and detail vector:
x¼x1þe1þe2þe3þ þeN1 ð4:18Þ Multi-resolution analysis involves approximation of the functions in a sequence 247
248
of nested linear vector spaces (Fig.4.10).
x e
xN–1
VN–1
Fig. 4.9 Vector diagram
WN-1
VN-1 VN
VN-2
WN-2
WN-3 Fig. 4.10 Vector space
4.5 Multiresolution Algorithm 73
249 If the space can have some function,x(t)2V0, thenx(2t)2V1andx(t/2)2V1.
250 This property says that the dilated function with dilation factor two belongs to the
251 next coarser subspace and dilation factor one-half belongs to the nextfiner space.
252 There exists a function known as the scaling functionϕ(t) such thatϕ(tk) is the
253 basis forV0. Translation and dilation of this basis function can represent approxi-
254 mation of any functionf(t) [8]. Figure4.11shows the space and resolution level.
255
References
256 1. Dhar, P. K., & Shimamura, T. (2015).Advances in audio watermarking based on singular value 257 decomposition. Cham: Springer.
258 2. Thanki, R., Borisagar, K., & Borra, S. (2018).Advance compression and watermarking tech- 259 nique for speech signals. Cham: Springer.
260 3. Bebis, G. (2001).Short time Fourier transform (STFT). Image processing fundamentals. CS474/
261 674.
262 4. Resnikoff, H. L., & Raymond Jr., O. (2012). Wavelet analysis: the scalable structure of 263 information. Cham: Springer.
264 5. Leisenberg M (1995). Hearing aids for the profoundly deaf based on neural net speech 265 processing. InICASSP-95, 1995 I.E. International Conference on Acoustics, Speech, and Signal 266 Processing(Vol 5, pp. 3535–3538). Piscataway: IEEE.
267 6. Agbinya, J. I. (1996). Discrete wavelet transform techniques in speech processing. InTENCON 268 ’96. Proceedings, 1996 I.E. TENCON. Digital Signal Processing Applications (Vol. 2, pp.
269 514–519). Piscataway: IEEE.
270 7. Xueying, Z., & Zhiping, J. (2004). Speech recognition based on auditory wavelet packetfilter. In 271 Proceedings of 7th International Conference on Signal Processing, 2004. ICSP’04(Vol 1, pp.
272 695–698).
273 8. Agbinya, J. I. (1996). Discrete wavelet transform techniques in speech processing. In 274 TENCON’96. Proceedings, 1996 I.E. TENCON. Digital Signal Processing Applications(Vol.
275 2, pp. 514–519).
x(t)
V1
V0
V–1
x(2t)
x(t/2) Fig. 4.11 Space and
resolution level
View publication stats