Multi-channel formats
Multi-channel sound can take many formats as listed in Table 6.2.
In addition, a low frequency effect (LFE), also known as a sub-woofer channel dedicated 120 Hz and lower frequencies is available with all combinations, such as Dolby Digital 5.1 which adds a sub-woofer to the 5-channel 3/2 mode and the 7.1 adding a sub-woofer the 7-channel 5/2 mode. The LFE channel provides the theatre effect. A typical 5.1 surround sound arrangement is shown in Figure 6.14.
Perception of sounds in space
We live in a reverberant world with sounds coming from all direction, some direct, some reflected. If the auditory system is able to distinguish between every different sound, we would hear a confusing cacophony of sounds. As was stated earlier, the ear has finite temporal discrimina- tion as well finite frequency discrimination. When two or more versions of the same sound (direct and reflected) arrive at the ear, they would not be treated as separate sounds unless they are separated by more than 50–60 ms in which case they may be heard as echoes. This is the reason why ‘echoes’ are only heard in a valley or a dome where the reflecting wall is some distance away. Versions of a sound that are separated by about 30 ms or less are assumed to be a single sound. This, however, does not impair the ability of the auditory system to locate the source of the sound based on the version that arrives to the ear first. This is the sound that travelled the shortest distance and that must be the source of the sound. This phenomenon is known as the precedent effect. The reflected sounds are the reverberations which give the effect of the ambience of real-life surround sound. They are at a relatively low level and as such are assumed to be inaudible by most coding systems and are therefore not transmitted. This is what has to be re-created at the decoding end.
Spatial perception is primarily attributed to three parameters, or cues, describing how humans localise sound in the horizontal plane: inter-aural level differences (ILD), inter-aural time differences (ITD) and inter-aural coherence (IC). These three concepts are illustrated in Figure 6.15, which schematically shows a human head and a distant sound source. Direct or first-arrival sound from the source impinges on the left ear while direct sound received by the right ear is diffracted around the head, with associated time delay and level attenuation. These two effects result in the ITD and ILD cues associated with a given source. If the sound is from a point source in a reverberant environment, reflected sound may impinge on both ears, or if the sound is from a diffuse source, non-correlated sound may impinge on both ears, either of which gives rise to the IC cue.
MPEG surround exploits inter-channel differences in level, phase and coherence equivalent to the ILD, ITD and IC cues to capture the spatial image of a multi-channel audio signal relative to a stereo (or mono) signal constructed from the original multi-channel signals. The cues are encoded in a very compact form and included into the side data portion of a MPEG
audio packet or in a separate auxiliary packet. Figure 6.16 illustrates the principle of MPEG surround sound encoding. The MPEG surround encoder receives a multi-channel audio signal, e.g. 5.1, a total of six chan- nels. They are fed into a downmixer to produce a 2-channel downmix sig- nal for stereo (or one-channel downmix signal for mono). The downmix signal is a faithful representation of the original multi-channel signal in stereophonic (or in the monophonic) spheres. It is this downmix signal that is compressed for transmission rather than the original multi-channel signal. The multi-channel signal is also fed into a spatial parameter esti- mation block to extract the ILD, ITD and IC cues of the input surround sound for inclusion in the audio packet bitstream.
A key aspect of the MPEG surround technique is that the transmitted downmix (e.g. stereo) is an excellent stereo version of the multi-channel signal. This is vital, since stereo presentation remains one of the main listening modes primarily via headphones, such as portable music play- ers. Additionally, MPEG surround supports a mode in which the down- mix is compatible with popular matrix surround decoders, e.g. Dolby surround.
At the decoding stage, the cue parameters are used to expand the down- mix signal into a high-quality multi-channel output (Figure 6.17). The operation involves a filterbank analyser for high-resolution time/frequency
transformation in preparation for the 2–6 upmix process. The upmix proces-sor using the transmitted spatial cues converts the 2-channel time/frequency representation of the input downmix into a 6-channel time/frequency repre- sentation which, following a 6-channel synthesis filterbank, is converted into the original 6-channel surround sound.