Audio encoding:MPEG-4 audio

MPEG-4 audio

The MPEG-4 audio standard provides a universal toolbox for transparent and efficient audio coding for many different application areas. Its universality makes it possible to use the same standard (MPEG-4 audio) for different applications, so it is no longer dedicated to specific applications. It can be adapted to different applications by selecting only the required tools out of its toolbox. Another advantage in comparison to older stan- dards is the expandability of the standard. There are still new develop- ments made for MPEG-4 to provide new tools for even more applications. The predefined profiles optimised for certain important applications define the tools used for these applications. Possible applications for MPEG-4 audio are internet streaming or downloads, digital radio broad- cast, digital satellite and cable broadcast, portable players, data storage (audio), third generation mobile phone and wireless networks multimedia services and bidirectional communications.

MPEG-4 AAC

MPEG-4 AAC combines the tools for general audio coding such as that used in digital audio and high definition television (HDTV) broadcasting. It builds and extends the compression tools used in MPEG-2 AAC. The following are the additional tools provided by MPEG-4 AAC:

Perceptual noise substitution

Noise-like sound which may be part of a normal sound material is very difficult to encode as it has very little if any redundancy. Experiments have shown that under certain circumstances, a listener would not be able to distinguish between the original noise-like signal and general noise that has the same amplitude that has been generated by the decoder. PNS sends the parameters of the noise-like signal (which requires fewer bits to describe) in place of the signal itself. In a band where there is no dominant tone and no transients, the coefficients representing the band are replaced by a noise substitution flag and the total power of the coefficients. The decoder recreates the noise-like signal by generating random coefficients (general noise) with the same power as that of the original signal.

Long-term prediction

This is a developed version of the prediction used in MPEG-2 AAC described earlier. This tool is especially effective for the parts of a signal which have clear pitch property (i.e. with many tonal components like a solo violin as well as speech). It exploits time redundancy between the current and the preceding frame (backward prediction).

Referring to Figure 6.13, the spectral coefficients of the preceding frame are fed through a decoder to be matched with the current frame to get the best prediction parameters. Then, the spectral representations of

Audio encoding-0552

the predicted frame are fed into a filter bank (TNS), filtered and sub- tracted from the current frame which has also gone through identical fil- ter bank and TNS filter to get a residual error signal. A frequency selective switch is used to choose either the residual or the original signal for each band, for further coding with the signal needing the smaller bit rate is chosen.

Twin vector quantisation

In MPEG-2 AAC, the coefficients are quantised using Huffman coding techniques. In cases where the channel bit rate is low, coarse quantisation takes place resulting in errors. For this reason, MPEG-4 AAC provides an alternative coding system in cases where the bit rate is below 16 kbps, namely twin vector quantisation (TVQ). TVQ works on blocks of coefficients rather than individual coefficients, with one symbol representing a num- ber of coefficients. Error is minimised by the use of interleaving.

Leave a comment

Your email address will not be published. Required fields are marked *