2 Audio Descriptors - Proceedings of the 3rd International Conference on Advanced Technologies

2.1 Classification of Audio Descriptors

We can classify the audio features comprehensively in two classes as follows. The global descriptor is the class of feature where computation is done on the complete signal as a whole. E.g. attack-time of a sound is known from the complete duration of an audio signal. Instantaneous descriptors is another class which computes on a short period (say 40 ms) of audio signal called a frame. The spectral centroid of a signal can vary with the time, hence it will be termed as an instantaneous descriptor.

As an instantaneous descriptor produces multiple values for the number of frames, use of statistical operations (like the mean or median, the standard deviation, and or inter-quartile range etc.) is essential to derive a single value representation. In the CUIDADO project [10], a listing of 166 audio features is provided.

Depending upon the type of process used for extraction of the feature, we can further differentiate:

• Features that are directly extracted from the time domain waveform of audio signal like the zero-crossing rate.

• A transform like FFT, wavelet etc. is applied on a signal to extract the features.

E.g. MFCC.

• Feature extracted based on a signal model like the source or filter model.

• Features which converges to the human auditory response (response on bark or ERB scale).

2.2 Timbrel Audio Descriptors

Timbre is the perceptual and multi-dimensional feature of sound. As the exact defi- nition of timbre is very difficult, it can be analyzed by following the attributes [11].

1. Harmonic analysis: The number, relative strengths, structure of harmonics.

2. Partial analysis: Phase, inharmonic partials, the content of partials.

3. Time-related parameter like rise-time.

4. Steady-state and attack slices.

34 V. M. Sardar et al.

2.3 Timbrel Audio Descriptors MIR Toolbox for MATLAB

Musical Information Retrieval (MIR) toolbox is for the most part intended to enable the study of the relation between musical attributes and music-tempted sensation.

MIR toolbox uses a modular outline. It is well known that the common algorithms are used in audio processing like segmentation, filtering, framing etc. with an addition of one or more distinguished algorithms at some stage of processing. These algorithms are available in a modular form and the individual blocks can be integrated to capture some features [12].

The philosophy to integrate the appropriate modules is proposed in Fig.2. For example, to measure irregularity and brightness, we need the implement the algorithms like reading audio samples, segmentation, filtering, and framing as the common processes between them. In the final stage, due to inherent differences, irregularity needs peaking algorithm and brightness needs spectrum analysis (Fig. 2).

Even, the integration of different stages depends upon parameter variations. E.g.

mirregularity (…, ‘Jensen’),where the adjoining partials are taken into considera- tion andmirregularity (…, ‘Krimphoff ’)which considers the mean of the preceding, same and next amplitude [13]. The flow diagram of algorithm modules and their integration to extract the selected timbrel features are shown in Fig.2.

Roll-off frequency: Roll-off is assessed from the foremost energy (85% or 95% as a standard) contained below the predefined frequency.

Roughness: It estimates the average disagreement between all peaks of the signal.

It is also an indicator of the presence of harmonics generally higher than the 6th harmonic.

Brightness: It is the measure of the percentage of energy spread above some cut-off frequency.

Fig. 2 The logical flow of timbrel feature implementation in MIR

Use of Median Timbre Features for Speaker Identification … 35 Irregularity: It may be calculated as the sum of the square of the difference in amplitude between adjoining partials or the sum of the amplitude minus the mean of the past, the same component and subsequent amplitude.

Miraudio: This command loads the appropriate format of an audio file. E.g.

miraudio (‘speaker.wav’).

Mirsegment: This process splits a continuous audio signal into homogeneous segments.

Mirfilterbank: A set of filters are required which are useful to select neighboring narrow sub-bands that cover the entire frequency range. The effect like aliasing in the reconstruction process is avoidede.g. mirfilterbank (…, ‘Gammatone’)processes a Gammatone filterbank decomposition. The frame decomposition can be performed using the mirframe command. The frames can be specified as follows:mirframe (x,…, ‘Length’, w, wu).

mirspectrum: Discrete Fourier Transform decomposes the energy of a signal (be it an audio waveform, or an envelope, etc.) along with the frequencies.

Mathematically, for an audio signal x;

Xk=

N−1

n=0

xne^−2ikn^N k=0, . . . ,N−1 (1)

This decomposition is performed using a Fast Fourier Transform by themirspec- trumfunction.

Mirpeaks: Many features like irregularity require the Peaks analysis. Peaks are calculated from any data x produced in MIR toolbox using the commandmirpeaks (x).

2.4 Timbrel Features for Better Speaker Identification

For better speaker identification, the accompanying conditions should be satisfied:

(i) The least variation in the inter-speaker feature (ii) Maximum discrimination in an inter-speaker feature. We have selected a limited well-performing feature (MFCC, Roll-off, Roughness, Brightness, and Irregularity) from MIR toolbox by using Hybrid Selection Algorithm. It uses the iterative process of testing for the performance of audio features. The accuracy of the system is tested with the independent features and then by successively appending the new feature in conjunction with the previous combination. To conclude, it is proven that a combined vector having timbrel features namely MFCC, Roll-off, Brightness, Roughness and, Irregularity is found to be the best [14]. An association among the intra-speaker samples and dissociation among the inter-speaker samples are confirmed by correlation analysis. It predicts that the selected timbrel features are well-performing for speaker identification in a whispered voice. It is validated by the identification experiments in subsequent sections.

36 V. M. Sardar et al.

Dalam dokumen Proceedings of the 3rd International Conference on Advanced Technologies for Societal Applications—Volume 1 (Halaman 43-46)