GZ05: Multimedia Systems: Audio Samples
[back to lecture nodes]
This page gives all the audio samples I played during the GZ05 lectures, and a
few more that there wasn't time for. The speakers in the lecture theatres
aren't the best, so some of the artifacts were not as clear as they might be.
I suggesting listening in a fairly quiet environment, with a good set of
headphones.
Some of the sound clips are quite large. Most of the are given as .wav files
(uncompressed PCM, either 8-bit or 16-bit) because any further compression
might interact badly with the compression artifacts I want you to hear.
Audio Basics
These sound clips illustrate the effects of sampling and quantization noise
(in the time domain).
Aliasing Effects
Tone that increases from 50Hz to 8KHz, then decreases back to 50KHz again.
- The original 48KHz sampled file:
.wav file[8 bit, mono, 384KB]
- Downsampled to 8KHz using a 4KHz anti-aliasing filter:
.wav file [8 bit, mono, 64KB]
- Downsampled to 8KHz by simply taking one sample in six:
.wav file [8 bit, mono,
64KB]
In the last sample, aliasing is clear, as for the middle part of the clip high
frequencies above 4KHz have been "folded" back into the lower 4KHz region.
Sampling and Aliasing of Music
A clip from Jean-Michel Jarre's Equinox:
- The original file, at 48KHz sampling:
.wav file[2MB]
- Downsampled to 8KHz using a 4KHz anti-aliasing (low-pass) filter:
.wav file[2MB]
- Downsampled to 8KHz by simply taking one sample in six:
.wav file[2MB]
The difference between the second and third clips is aliasing noise - sounds above 4KHz being sampled at 8KHz, and appearing as aliasing noise below 4KHz.
Quantization Effects
A simple tone, quantized to 256 linear levels, 16 linear levels, and 4 linear
levels.
- 256 levels: .wav file [48KHz sampled, 8 bit,
mono, 192KB]
- 16 levels: .wav file [48KHz sampled, 8 bit,
mono, 192KB]
- 4 levels: .wav file [48KHz sampled, 8 bit,
mono, 192KB]
As the number of levels is decreased, more square edges are introduced, and
higher frequency quantization noise appears.
Speech Coding
These first set of clips give a basic comparison of speech codecs.
The phase "A lathe is a big tool. Grab every dish of sugar.",
spoken in a quiet environment. This sentence is a bit strange, but it
has a good mix of sounds including "th" (lathe) and "sh" (dish, sugar)
which are unvoiced sounds, some difficult consonants "b" (big), "t"
(tool) and "G" (grab), which are part voiced, part unvoiced, and the
"s" in "is" which is pronouced like a "z" simultaneously voiced and
unvoiced.
[Sound files obtained from Data-Compression.com]
- Original PCM u-law
.wav file [8KHz
sampled, u-law, mono, 32KB
- ADPCM (32 Kb/s)
.wav file [8KHz
sampled, u-law, mono, 32KB]
- CELP (4.8 Kb/s)
.wav file [8KHz
sampled, u-law, mono, 32KB]
- LPC10 (2.4 Kb/s)
.wav file [8KHz
sampled, u-law, mono, 32KB]
The next set of clips illustrate how well these codecs perform in a noise
environment. Notice in LPC how the background noise sometimes triggers the
wrong voiced/unvoiced decision by the codec. This error is presumably also
happening in CELP and G.729, but the residue coding mostly manages to
compensate.
[I've lost the reference for where these came from - if you find them again, let me know so I can give credit]
- Original PCM 16-bit linear file (128Kb/s)
.wav file
[8KHz sampled, 16 bit linear, mono, 342KB]
- G.729 (8Kb/s)
.wav file
[8KHz sampled, 16 bit linear, mono, 342KB]
- CELP (4.8Kb/s)
.wav file
[8KHz sampled, 16 bit linear, mono, 342KB]
- LPC10
.wav file
[8KHz sampled, 16 bit linear, mono, 342KB]
The third set of clips illustrates what happens when you try and code music with a low bitrate speech codec. Pretty funny, but not very nice.
[I've lost the reference for where these came from - if you find them again, let me know so I can give credit]
- G.729 (8Kb/s)
.wav file
[8KHz sampled, 16 bit linear, mono, 87KB]
- LPC10
.wav file
[8KHz sampled, 16 bit linear, mono, 87KB]
Music Coding
Masking Effects
These clips illustrate frequency masking effects in perceptual codecs.
They consist of one low-amplitude tone that pulses on and off, and
then a second higher amplitude tone that appears. If the higher
amplitide tone is loud enough and close enough in frequency to the
first tone, the first tone becomes inaudible, and so a perceptual
codec would not need to encode it. In all of these clips the louder
tone is of the same amplitude - all that differs is the frequency.
- 550Hz quiet tone, 100Hz loud tone. Quiet tone is not masked.
.wav file
[48KHz sampled, 16 bit, mono, 2.3MB]
- 550Hz quiet tone, 200Hz loud tone. Quiet tone is not masked.
.wav file
[48KHz sampled, 16 bit, mono, 2.3MB]
- 550Hz quiet tone, 500Hz loud tone. Quiet tone is masked.
.wav file
[48KHz sampled, 16 bit, mono, 2.3MB]
- 550Hz quiet tone, 1000Hz loud tone. Quiet tone is not masked.
.wav file
[48KHz sampled, 16 bit, mono, 2.3MB]
These two clips illustrate that noise can be more effective at masking than a pure tone of the same amplitude. In these, the same 550Hz tone pulses on and off, but instead of a loud tone, loud narrowband noise is used. The noise does not need to be so loud to mask the 550Hz tone.
-
- 550Hz quiet tone, 500Hz narrowband (25Hz bandwidth butterworth filter) noise. Quiet tone is masked.
.wav file
[48KHz sampled, 16 bit, mono, 2.3MB]
- 550Hz quiet tone, 1000Hz narrowband (25Hz bandwidth butterworth filter) noise. Quiet tone is not masked.
.wav file
[48KHz sampled, 16 bit, mono, 2.3MB]
MP3 Encoding
Several clips to illustrate MP3's noise behaviour. At 192Kb/s, MP3 is
pretty comparable with CD quality. At 128Kb/s I can hear the
difference, but only when listening on good speakers or good
headphones. Different people seem to have different thresholds for
hearing MP3 compression artifacts.
At lower bitrates, MP3 artifacts become very noticable. At 64Kb/s,
MP3 would normally have reduced from a 44.1KHz sample rate down to a
22.05 KHz sample rate, sacrificing higher frequencies to avoid
introducing unacceptable noise in the more important lower
frequencies. In these samples, I've forced LAME to maintain a 44.1KHz
sample rate so the artifacts become more noticable. Once you've heard
them at 32KB/s, then you start to understand what sort of effects to
listen for at higher bitrates.
- Mendelssohn at 128Kb/s.
MP3 file
[44.1KHz sampled, Joint stero encoding, 908KB]
- Mendelssohn at 64Kb/s.
MP3 file
[44.1KHz sampled, Joint stero encoding, 908KB]
- Mendelssohn at 32Kb/s.
MP3 file
[44.1KHz sampled, Joint stero encoding, 908KB]
Ogg Vorbis Encoding
Finally a comparison between MP3 and Ogg Vorbis at 64Kb/s. At this
data rate, MP3 has reduced the sample rate to 22.05KHz, whereas Vorbis
is still using 44.1KHz, so the main difference is a loss of higher
frequencies. But compare this to the Mendelssohn MP3 above sampled at
44.1KHz and compressed to 64Kb/s, where the artifacts are obvious and
annoying.
[Sound clips taken from xiph.org listening comparisons.]
- Ogg Vorbis Compilation at 64Kb/s.
.wav file
[44.1KHz sampled, Joint stero encoding, 12MB]
- MP3 Compilation at 64Kb/s.
.wav file
[22.05KHz sampled, Joint stero encoding, 6MB]