8 Audio Coding - (Signals And Communication Technology) Walter Fischer

8.1 Digital Audio Source Signal

The human ear has a dynamic range of about 140 dB and a hearing bandwidth of up to 20 kHz. High-quality audio signals must, therefore, match these characteristics. Before the analog audio signals are sampled and dig- itized, they have to be band-limited by means of a low-pass filter. Then analog-to-digital conversion is performed at a sampling rate of 32 kHz, 44.1 kHz or 48 kHz (and now also at 96 kHz), and with a resolution of at least 16 bits. The 44.1 kHz sampling rate corresponds to that of audio CDs, 48/96 kHz are studio quality. While the 32 kHz sampling frequency is still provided for in the MPEG standard, it is in fact obsolete. A sampling rate of 48 kHz at 16 bit resolution yields a data rate of 786 kbit/s per channel, which means approx. 1.5 Mbit/s for a stereo signal (Fig. 8.1.).

Fig. 8.1. Digital audio source signal

The objective of audio compression is to reduce the 1.5 Mbit/s data rate to between about 100 kbit/s and 400 kbit/s. MP3 audio files, which are

AD 15...20 kHz

Bandwidth 32/44.1/48 kHz Audio sampling

frequency 16 bit

...768 kbit/s

AD 15...20 kHz

Bandwidth 32/44.1/48 kHz Audio sampling

frequency 16 bit

...768 kbit/s Right

Left

~1.5 Mbit/s

100...400 kbit/s Compression

W. Fischer, Digital Video and Audio Broadcasting Technology, Signals and Communication Technology,

https://doi.org/10.1007/978-3-030-32185-7_8

177

very widely used today, often have a data rate below 64 kbit/s. Similarly as with video compression, this is achieved by way of redundancy reduction and irrelevance reduction. In redundancy reduction, superfluous information is simply omitted; there is no loss of information. By contrast, in irrelevance reduction information is eliminated that cannot be perceived at the receiving end, in this case the human ear. All audio compression methods are based on a psychoacoustic model, i.e. they make use of the "imper- fection" of the human ear to remove irrelevant information from the audio signal. The human ear is not capable of perceiving sound events close to strong sound pulses in frequency or in time. This means that, to the ear, certain sound events will mask other sound events of lower amplitude.

8.2 History of Audio Coding

In the year 1988, the MASCAM method was developed at the Institut für Rundfunktechnik (IRT) in Munich in preparation for the digital audio broadcasting (DAB) system. From MASCAM, the MUSICAM (masking pattern universal subband integrated coding and multiplexing) method was developed in 1989 in cooperation with CCETT, Philips and Matsushita.

MUSICAM-coded audio signals are used in DAB. MASCAM and MUSICAM are both based on subband coding. The audio signal is split in- to a large number of subbands, each of which is subjected to irrelevance reduction to a greater or lesser degree.

At the same time as the subband coding method was developed, the Fraunhofer Gesellschaft together with Thomson devised the ASPEC (Adaptive Spectral Perceptual Entropy Coding) method, which is based on transform coding. The audio signal is transformed from the time to the frequency domain using DCT (Discrete Cosine Transform), and then irrelevant signal components are removed.

Both the subband-coding MUSICAM and the transform-coding ASPEC method were included in the MPEG-1 audio compression method, which was established in 1991 (ISO/IEC 11172-3 standard). MPEG-1 audio comprises three possible layers: layer II essentially use MUSICAM coding, and layer III principally uses ASPEC coding. MP3 audio files are coded to MPEG-1 layer III. MP3 is often mistaken for MPEG-3. MPEG-3 was originally aimed at implementing HDTV (high definition television), but HDTV was already integrated in the MPEG-2 standard, so MPEG-3 was skipped and abandoned altogether. Therefore the MPEG-3 standard does not exist.

8.2 History of Audio Coding 179

In MPEG-2 audio, the three layers of MPEG-1 audio were taken over, and layer II was extended to form layer II MC (multichannel). The ISO/IEC 13818-3 MPEG-2 audio standard was adopted in 1994.

Fig. 8.2. Development of MPEG audio [DAMBACHER]

Simultaneously with MPEG audio, the Dolby digital audio standard (also known as AC-3 audio) was developed by Dolby Labs in the USA. This standard was laid down in 1990 and first presented to the public in the movie "Star Trek VI" shown in December 1991. Nowadays, many movies employ the Dolby digital technique. In the USA, digital terrestrial TV broadcasts to ATSC use AC-3 audio coding exclusively. Some other coun- tries too introduced AC-3 audio in addition to MPEG audio. The use of both AC-3 audio and MPEG audio is meaningful, if only because of the fact that this does away with the recoding of movies. As from the point of quality, there is practically no difference between MPEG audio and Dolby digital. Modern MPEG decoder chips, therefore, support both methods.

DVD video discs too may use Dolby digital AC-3 audio in addition to PCM audio and MPEG audio. Below is a short overview of the development of Dolby digital:

 1990 Dolby digital AC-3 audio

MASCAM IRT Munich, 1988

ASPEC

Fraunhofer Gesellschaft, Thomson

MUSICAM,

IRT, CCETT, Philips, Matsushita,

1989

ISO/IEC 11172-3 MPEG1 Audio, 1990/91 Layer I, low complex encoder, low compression Layer II, medium complex encoder

Layer III high complex encoder, high compression, subband & transform coding (...mp3)

ISO/IEC 13818-3 MPEG2 Audio, 1994 Layer I, II, III (same as MPEG1)

Layer II MC (Multichannel Audio up to 5.1) Transform coding (DCT) Subband

coding

Data rates layer I, II, III:

I: 32...384kbit/s II: 32...448kbit/s III: 32...192kbit/s

 1991 First AC-3 audio coded movie show

 Dec. 1991 "Star Trek VI" coded in AC-3 audio Today:

 AC-3 audio is used as standard in many movies, in ATSC and, in addition to MPEG audio, in MPEG-2 transport streams all over the world, and on DVDs.

 Dolby AC-3 audio transform coding based on Modified Dis- crete Cosine Transform (MDCT); 5.1 audio channels (left, cen- ter, right, left surround, right surround, subwoofer), 128 kbit/s per channel.

MPEG, too, has come up with new audio coding methods:

 MPEG-2 AAC ISO/IEC 13818-7 AAC = Advanced Audio Coding

 MPEG-4 ISO/IEC 14496-3:

AAC and AAC Plus

Fig. 8.3. Anatomy of the human ear Outer

ear

Middle ear

Inner ear

Auditory nerves Ossicles

(malleus, incus, stapes)

Eustachian tube Eardrum

Semicircular canals (organ of

balance) Cochlea (organ of Corti)

8.3 Psychoacoustic Model of the Human Ear 181

8.3 Psychoacoustic Model of the Human Ear

In the following section, the process of audio compression will be dis- cussed. Redundancy reduction (lossless) and irrelevance reduction (lossy) lower the data rate of the original audio signal by about 90 %. Irrelevance reduction relies on the psychoacoustic model of the human ear, which essentially goes back to Professor Zwicker, former holder of a professorship for electroacoustics at the Technical University of Munich. This type of reduction is based on what is referred to as perceptual coding. This means that audio components which are not perceived by the human ear are not transmitted.

Let us first have a look at the anatomy of the human ear (Fig. 8.3., 8.4.).

The ear consists of three main parts: the outer ear, the middle ear, and the inner ear. The outer ear performs the functions of impedance matching, sound transmission over air, and acts as a filter with a slight resonance step-up in the region of 3 kHz. It is in the same region, i.e. from 3 kHz to 4 kHz, that the human ear exhibits its maximum sensitivity. The eardrum or tympanic membrane converts sound waves to mechanical vibrations, which are transmitted via the malleus, incus and stapes to a membranous window leading to the sensory inner ear. The air pressure must be the same, ahead of and behind the eardrum. This is ensured by a tube connect- ing the region behind the eardrum with the pharynx; the tube is called the Eustachian tube. Everyone knows the problem of pressure building up in the ear when climbing large heights. By swallowing, the mucous membrane in the Eustachian tube provides for pressure compensation.

In the inner ear we find the organ of balance, which is made up of sev- eral liquid-filled arches, and the cochlea. The cochlea is the actual hearing organ (organ of Corti) by which sound is directly perceived. If the cochlea were to be uncoiled, the sensors for the high frequencies would be found at its entrance, then the sensors for the medium frequencies, and at the end of the cochlea would be the sensors for the low frequencies.

The cochlea consists of a spiral canal in which lies a smaller membranous spiral passage that becomes wider from the front to the rear. On the inner membrane rest the frequency-selective sound-collecting sensors from which the auditory nerves extend to the brain. The auditory nerves transport electrical signals with an amplitude of approx. 100 mVpp. The repetition rate of the electrical pulses is in the order of 1 kHz. The information contained in this rate is the volume of a tone at a given frequency.

The louder the tone, the higher the repetition rate. Each frequency sensor communicates with the brain via a separate neural line. The frequency se-

Dalam dokumen (Signals And Communication Technology) Walter Fischer - Digital Video And Audio Broadcasting Technology A Practical Engineering Guide-Springer (2020) (Halaman 195-200)