GZ05: Multimedia Systems: Audio Samples

[back to lecture nodes]

This page gives all the audio samples I played during the GZ05 lectures, and a few more that there wasn't time for. The speakers in the lecture theatres aren't the best, so some of the artifacts were not as clear as they might be. I suggesting listening in a fairly quiet environment, with a good set of headphones.

Some of the sound clips are quite large. Most of the are given as .wav files (uncompressed PCM, either 8-bit or 16-bit) because any further compression might interact badly with the compression artifacts I want you to hear.

Audio Basics

These sound clips illustrate the effects of sampling and quantization noise (in the time domain).

Aliasing Effects

Tone that increases from 50Hz to 8KHz, then decreases back to 50KHz again.

The original 48KHz sampled file:
.wav file[8 bit, mono, 384KB]
Downsampled to 8KHz using a 4KHz anti-aliasing filter:
.wav file [8 bit, mono, 64KB]
Downsampled to 8KHz by simply taking one sample in six:
.wav file [8 bit, mono, 64KB]

In the last sample, aliasing is clear, as for the middle part of the clip high frequencies above 4KHz have been "folded" back into the lower 4KHz region.

Sampling and Aliasing of Music

A clip from Jean-Michel Jarre's Equinox:

The original file, at 48KHz sampling:
.wav file[2MB]
Downsampled to 8KHz using a 4KHz anti-aliasing (low-pass) filter:
.wav file[2MB]
Downsampled to 8KHz by simply taking one sample in six:
.wav file[2MB]

The difference between the second and third clips is aliasing noise - sounds above 4KHz being sampled at 8KHz, and appearing as aliasing noise below 4KHz.

Quantization Effects

A simple tone, quantized to 256 linear levels, 16 linear levels, and 4 linear levels.

256 levels: .wav file [48KHz sampled, 8 bit, mono, 192KB]
16 levels: .wav file [48KHz sampled, 8 bit, mono, 192KB]
4 levels: .wav file [48KHz sampled, 8 bit, mono, 192KB]

As the number of levels is decreased, more square edges are introduced, and higher frequency quantization noise appears.

Speech Coding

These first set of clips give a basic comparison of speech codecs. The phase "A lathe is a big tool. Grab every dish of sugar.", spoken in a quiet environment. This sentence is a bit strange, but it has a good mix of sounds including "th" (lathe) and "sh" (dish, sugar) which are unvoiced sounds, some difficult consonants "b" (big), "t" (tool) and "G" (grab), which are part voiced, part unvoiced, and the "s" in "is" which is pronouced like a "z" simultaneously voiced and unvoiced.

[Sound files obtained from Data-Compression.com]

Original PCM u-law
.wav file [8KHz sampled, u-law, mono, 32KB
ADPCM (32 Kb/s)
.wav file [8KHz sampled, u-law, mono, 32KB]
CELP (4.8 Kb/s)
.wav file [8KHz sampled, u-law, mono, 32KB]
LPC10 (2.4 Kb/s)
.wav file [8KHz sampled, u-law, mono, 32KB]

The next set of clips illustrate how well these codecs perform in a noise environment. Notice in LPC how the background noise sometimes triggers the wrong voiced/unvoiced decision by the codec. This error is presumably also happening in CELP and G.729, but the residue coding mostly manages to compensate.

[I've lost the reference for where these came from - if you find them again, let me know so I can give credit]

Original PCM 16-bit linear file (128Kb/s)
.wav file [8KHz sampled, 16 bit linear, mono, 342KB]
G.729 (8Kb/s)
.wav file [8KHz sampled, 16 bit linear, mono, 342KB]
CELP (4.8Kb/s)
.wav file [8KHz sampled, 16 bit linear, mono, 342KB]
LPC10
.wav file [8KHz sampled, 16 bit linear, mono, 342KB]

The third set of clips illustrates what happens when you try and code music with a low bitrate speech codec. Pretty funny, but not very nice.

[I've lost the reference for where these came from - if you find them again, let me know so I can give credit]

G.729 (8Kb/s)
.wav file [8KHz sampled, 16 bit linear, mono, 87KB]
LPC10
.wav file [8KHz sampled, 16 bit linear, mono, 87KB]

Music Coding

Masking Effects

These clips illustrate frequency masking effects in perceptual codecs. They consist of one low-amplitude tone that pulses on and off, and then a second higher amplitude tone that appears. If the higher amplitide tone is loud enough and close enough in frequency to the first tone, the first tone becomes inaudible, and so a perceptual codec would not need to encode it. In all of these clips the louder tone is of the same amplitude - all that differs is the frequency.

550Hz quiet tone, 100Hz loud tone. Quiet tone is not masked.
.wav file [48KHz sampled, 16 bit, mono, 2.3MB]
550Hz quiet tone, 200Hz loud tone. Quiet tone is not masked.
.wav file [48KHz sampled, 16 bit, mono, 2.3MB]
550Hz quiet tone, 500Hz loud tone. Quiet tone is masked.
.wav file [48KHz sampled, 16 bit, mono, 2.3MB]
550Hz quiet tone, 1000Hz loud tone. Quiet tone is not masked.
.wav file [48KHz sampled, 16 bit, mono, 2.3MB]

These two clips illustrate that noise can be more effective at masking than a pure tone of the same amplitude. In these, the same 550Hz tone pulses on and off, but instead of a loud tone, loud narrowband noise is used. The noise does not need to be so loud to mask the 550Hz tone.

550Hz quiet tone, 500Hz narrowband (25Hz bandwidth butterworth filter) noise. Quiet tone is masked.
.wav file [48KHz sampled, 16 bit, mono, 2.3MB]
550Hz quiet tone, 1000Hz narrowband (25Hz bandwidth butterworth filter) noise. Quiet tone is not masked.
.wav file [48KHz sampled, 16 bit, mono, 2.3MB]

MP3 Encoding

Several clips to illustrate MP3's noise behaviour. At 192Kb/s, MP3 is pretty comparable with CD quality. At 128Kb/s I can hear the difference, but only when listening on good speakers or good headphones. Different people seem to have different thresholds for hearing MP3 compression artifacts.

At lower bitrates, MP3 artifacts become very noticable. At 64Kb/s, MP3 would normally have reduced from a 44.1KHz sample rate down to a 22.05 KHz sample rate, sacrificing higher frequencies to avoid introducing unacceptable noise in the more important lower frequencies. In these samples, I've forced LAME to maintain a 44.1KHz sample rate so the artifacts become more noticable. Once you've heard them at 32KB/s, then you start to understand what sort of effects to listen for at higher bitrates.

Mendelssohn at 128Kb/s.
MP3 file [44.1KHz sampled, Joint stero encoding, 908KB]
Mendelssohn at 64Kb/s.
MP3 file [44.1KHz sampled, Joint stero encoding, 908KB]
Mendelssohn at 32Kb/s.
MP3 file [44.1KHz sampled, Joint stero encoding, 908KB]

Ogg Vorbis Encoding

Finally a comparison between MP3 and Ogg Vorbis at 64Kb/s. At this data rate, MP3 has reduced the sample rate to 22.05KHz, whereas Vorbis is still using 44.1KHz, so the main difference is a loss of higher frequencies. But compare this to the Mendelssohn MP3 above sampled at 44.1KHz and compressed to 64Kb/s, where the artifacts are obvious and annoying.

[Sound clips taken from xiph.org listening comparisons.]

Ogg Vorbis Compilation at 64Kb/s.
.wav file [44.1KHz sampled, Joint stero encoding, 12MB]

MP3 Compilation at 64Kb/s.
.wav file [22.05KHz sampled, Joint stero encoding, 6MB]