GZ05: Multimedia Systems: Audio Samples

[back to lecture nodes]

This page gives all the audio samples I played during the GZ05 lectures, and a few more that there wasn't time for. The speakers in the lecture theatres aren't the best, so some of the artifacts were not as clear as they might be. I suggesting listening in a fairly quiet environment, with a good set of headphones.

Some of the sound clips are quite large. Most of the are given as .wav files (uncompressed PCM, either 8-bit or 16-bit) because any further compression might interact badly with the compression artifacts I want you to hear.

Audio Basics

These sound clips illustrate the effects of sampling and quantization noise (in the time domain).

Aliasing Effects

Tone that increases from 50Hz to 8KHz, then decreases back to 50KHz again.

In the last sample, aliasing is clear, as for the middle part of the clip high frequencies above 4KHz have been "folded" back into the lower 4KHz region.

Sampling and Aliasing of Music

A clip from Jean-Michel Jarre's Equinox:

The difference between the second and third clips is aliasing noise - sounds above 4KHz being sampled at 8KHz, and appearing as aliasing noise below 4KHz.

Quantization Effects

A simple tone, quantized to 256 linear levels, 16 linear levels, and 4 linear levels.

As the number of levels is decreased, more square edges are introduced, and higher frequency quantization noise appears.

Speech Coding

These first set of clips give a basic comparison of speech codecs. The phase "A lathe is a big tool. Grab every dish of sugar.", spoken in a quiet environment. This sentence is a bit strange, but it has a good mix of sounds including "th" (lathe) and "sh" (dish, sugar) which are unvoiced sounds, some difficult consonants "b" (big), "t" (tool) and "G" (grab), which are part voiced, part unvoiced, and the "s" in "is" which is pronouced like a "z" simultaneously voiced and unvoiced.

[Sound files obtained from Data-Compression.com]

The next set of clips illustrate how well these codecs perform in a noise environment. Notice in LPC how the background noise sometimes triggers the wrong voiced/unvoiced decision by the codec. This error is presumably also happening in CELP and G.729, but the residue coding mostly manages to compensate.

[I've lost the reference for where these came from - if you find them again, let me know so I can give credit]

The third set of clips illustrates what happens when you try and code music with a low bitrate speech codec. Pretty funny, but not very nice.

[I've lost the reference for where these came from - if you find them again, let me know so I can give credit]

Music Coding

Masking Effects

These clips illustrate frequency masking effects in perceptual codecs. They consist of one low-amplitude tone that pulses on and off, and then a second higher amplitude tone that appears. If the higher amplitide tone is loud enough and close enough in frequency to the first tone, the first tone becomes inaudible, and so a perceptual codec would not need to encode it. In all of these clips the louder tone is of the same amplitude - all that differs is the frequency. These two clips illustrate that noise can be more effective at masking than a pure tone of the same amplitude. In these, the same 550Hz tone pulses on and off, but instead of a loud tone, loud narrowband noise is used. The noise does not need to be so loud to mask the 550Hz tone.

MP3 Encoding

Several clips to illustrate MP3's noise behaviour. At 192Kb/s, MP3 is pretty comparable with CD quality. At 128Kb/s I can hear the difference, but only when listening on good speakers or good headphones. Different people seem to have different thresholds for hearing MP3 compression artifacts.

At lower bitrates, MP3 artifacts become very noticable. At 64Kb/s, MP3 would normally have reduced from a 44.1KHz sample rate down to a 22.05 KHz sample rate, sacrificing higher frequencies to avoid introducing unacceptable noise in the more important lower frequencies. In these samples, I've forced LAME to maintain a 44.1KHz sample rate so the artifacts become more noticable. Once you've heard them at 32KB/s, then you start to understand what sort of effects to listen for at higher bitrates.

Ogg Vorbis Encoding

Finally a comparison between MP3 and Ogg Vorbis at 64Kb/s. At this data rate, MP3 has reduced the sample rate to 22.05KHz, whereas Vorbis is still using 44.1KHz, so the main difference is a loss of higher frequencies. But compare this to the Mendelssohn MP3 above sampled at 44.1KHz and compressed to 64Kb/s, where the artifacts are obvious and annoying.

[Sound clips taken from xiph.org listening comparisons.]