Distributed Multimedia

Data compression

Devices that encode and decode as well compress and decompress are called CODECs or CODer DECoders. Sometimes, these terms are used for audio, but mainly they are for video devices.

A video CODEC can be anything from the simplest A2D device, through to something that does picture pre-processing, and even has network adapters build into it (i.e. a videophone!). A CODEC usually does most of its work in hardware, but there is no reason not to implement everything (except the a2d capture:-), in software on a reasonably fast processor.

The most expensive and complex component of a CODEC is the compression/decompression part. There are a number of international standards, as well as any number of proprietary compression techniques for video.

Data compression

Data (files etc.) typically compressed using Huffman codes or Run Length, or clever statistical rules such as Lempel-Ziv

Audio and Video are loss tolerant, so can use cleverer compression that discards some information

Compression of 400 times is possible on video - useful given the base uncompressed data rate of a 25 fps CIF image is 140Mbps

A lot of standards for this now

Some good proprietary techniques

Note that lossy compression of video is not acceptable to some classes of user (e.g. radiologist, or air traffic controller).

Video compression

Video compression

Video compression can take away the requirement for the very high data rates and move video transmission and storage into a very similar regime to that for audio. In fact, in terms of tolerance for poor quality, it seems humans are better at adapting to poor visual information than poor audio information. A simple minded calculation shows:

1024 x 1024 pixels,

3 bytes per pixel (24 bit RGB)

25 Frames per second

yields 75Mbytes/second, or 600Mbps - this is right on the limit of modern transmission capacity.

Even in this age of deregulation and cheaper telecoms, and larger, faster disks, this is profligate.

On the other hand, for a scene with a human face in, as few as 64 pixels square, and 10 frames per second might suffice for a meaningful image.

64x 64 pixels

3 bytes per pixel (24 bit RGB)

10 Frames per second

yields 122KBytes/Second, or just under 1 Mbps - this is achievable on modern LANs and high speed WANs but still not friendly!

Notice that in the last simple example, we did two things to the picture.

1. We used less "space" for each frame by sending less "detail".

2. We sent frames less frequently since little is moving.

This is a clue as to how to go about improving things. Basically, if there isn’t much information to send, we avoid sending it. Spatial and temporal domain compression are both used in many of the standards.

Video compression

1024 x 1024 pixels,

3 bytes per pixel (24 bit RGB)

25 Frames per second

yields 75Mbytes/second, or 600Mbps!!!

1. We could use less "space" for each frame by sending less "detail".

2. We could send frames less frequently since little is moving.

Lossy versus Lossless Compression

If a frame contains a lot of image that is the same, maybe we can encode this with less bits without losing any information (run length encode, use logically larger pixels etc. etc.). On the other hand, we can take advantage of other features of natural scenes to reduce the amount of bits - for example, nature is very fractal, or self-similar:- there are lots of features, sky, grass, lines on face etc., that are repetitive at any level of detail. If we leave out some levels of detail, the eye (and human visual cortex processing) end up being fooled a lot of the time.

Lossy versus Lossless Compression

If area of input doesn’t change, don’t send it
If area of input doesn’t change much, don’t send it
If moving area is detailed, could send "fuzzy" version
If still area has detail, could send this slower than large features
All depends on human frailty!

Hierarchical coding

Hierarchical coding is based on the idea that coding will be in the form of quality hierarchy where the lowest layer of hierarchy contains the minimum information for intelligibility. Succeeding layers of the hierarchy adds increasing quality to the scheme.

This compression mechanism is ideal for transmission over packet switched networks where the network resources are shared between many traffic streams and delays, losses and errors are expected.

Packets will carry data from only one layer, accordingly packets can be marked according to their importance for intelligibility for the end-user. The network would use these information as a measure of what sort of packets to be dropped, delayed and what should take priority. It should be noted that priority bits already exist in some protocols such as the IP protocol.

Hierarchical coding will also be ideal to deal with multicasting transmission over links with different bandwidths. To deal with such problem in a non-hierarchical encoding scheme, either the whole multicasting traffic adapts to the lowest bandwidth link capabilities thus degrading the video/audio quality where it could have been better or causing the low link to suffer from congestion and thus sites affected will lose some of the intelligibility in their received video/audio. With hierarchical coding, low level packets can be filtered out whenever a low bandwidth link is encountered thus preserving the intelligibility of the video/audio for the sites affected by these links and still delivering a better quality to sites with higher bandwidth.

Schemes that are now in relatively commonplace use include H.261 for videotelephony, MPEG for digital TV and VCRs and JPEG for still images. Most current standards are based on one simple technique, so first lets look at that.

Hierarchical coding

Last idea was that levels of detail can be sent at different rates or priorities
Can be useful if there are different users (e.g. in a TV broadcast, or Internet multicast)
Can be useful for deciding what to lose in the face of overload or lack of disk storage etc.
Many of the video encodings (and still picture standards) are well suited to this.

JPEG

The JPEG standard`s goal has been to develop a method for continuous-tone image compression for both color and greyscale images. The standard define four modes:

Sequential In this mode each image is encoded in a single left-to-right, top-to-bottom scan. This mode is the simplest and most implemented one in both hardware and software implementation.
Progressive In this mode the image is encoded in multiple scans. This is helpful for applications in which transmission time is too long and the viewer prefers to watch the image building in multiple coarse-to-clear passes.
Lossless The image here is encoded to guarantee exact recovery of every source image sample value. This is important to applications where any small loss of image data is significant. Some medical applications do need that mode.
Hierarchical Here the image is encoded at multiple resolutions, so that low-resolution versions may be decoded without having to decode the higher resolution versions. This mode is beneficial when transmission over packet switched networks. Only the data significant for a certain resolution determined by the application can be transmitted, thus allowing more applications to share the same network resources. In real time transmission cases (e.g. an image pulled out of an information server and synchronized with a real-time video clip), a congested network can start dropping packets containing the highest resolution data resulting in a degraded quality of the image instead of delay.

JPEG uses the Discrete Cosine Transform to compress spatial redundancy within an image in all of its modes apart from the lossless one where a predictive method issued instead.

As JPEG was essentially designed for the compression of still images, it makes no use of temporal redundancy which is a very important element in most video compression schemes. Thus, despite the availability of real-time JPEG video compression hardware, its use will be quite limit due to its poor video quality.

JPEG

JPEG has 4 modes

Sequential scanned left to right, top to bottom
Progressive - coarse to clear
Lossless
Hierarchical

Uses the Discrete Cosine Transform to encode and compress blocks

H261

H261 is the most widely used international video compression standard for video conferencing. The standard describes the video coding and decoding methods for the moving picture component of a audiovisual service at the rates of p * 64 kbps where p is in the range of 1 to 30. The standard targets and is really suitable for applications using circuit switched networks as their transmission channels. This is understandable as ISDN with both basic and primary rate access was the communication channel considered within the framework of the standard.

H.261 is usually used in conjunction with other control and framing standards such as H221, H230 H242 and H320, of which more later.

H.261

ITU (was CCITT) standard for video telephony

Very commonly implemented now in hardware and software

aimed at ISDN, anything from 64Kbps to 2Mbps

PC cards to do video, audio and ISDN exist

Used with other standards for communications and conference control.

H.261 SOURCE IMAGES format

The source coder operates on only non-interlaced pictures. Pictures are coded as luminance and two color difference components(Y, Cb, Cr). The Cb and Cr matrices are half the size of the Y matrix.

H261 supports two image resolutions, QCIF which is (144x176 pixels)and , optionally, CIF which is(288x352).

H.261 SOURCE IMAGES format

Picture: YC_arrangement.ps

The diagram shows the sampling of Chrominance and Luminance.

H.261 supports two resolutions:

CIF = 288*352 pixels
QCIF = 144*176 pixels

H.261 SOURCE CODER

There main elements in an H.261 encoder are

Prediction
Block Transformation
Quantization

H.261 SOURCE CODER

:h261-encoder.ps

H.261 Prediction

H261 defines two types of coding. INTRA coding where blocks of 8x8 pixels each are encoded only with reference to themselves and are sent directly to the block transformation process. On the other hand INTER coding frames are encoded with respect to another reference frame.

A prediction error is calculated between a 16x16 pixel region (macroblock) and the (recovered) correspondent macroblock in the previous frame. Prediction error of transmitted blocks (criteria of transmission is not standardized) are then sent to the block transformation process.

H.261 prediction

Blocks are inter or intra coded
Intra-coded blocks stand alone
Inter-coded blocks are based on predicted error between the previous frame and this one
Intra-coded frames must be sent with a minimum frequency to avoid loss of synchronisation of sender and receiver.

H.261 Block transformation

H261 supports motion compensation in the encoder as an option. In motion compensation a search area is constructed in the previous (recovered) frame to determine the best reference macroblock . Both the prediction error as well as the motion vectors specifying the value and direction of displacement between the encoded macroblock and the chosen reference are sent. The search area as well as how to compute the motion vectors are not subject to standardization. Both horizontal and vertical components of the vectors must have integer values in the range + 15 and 15 though

In block transformation, INTRA coded frames as well as prediction errors will be composed into 8x8 blocks. Each block will be processed by a two-dimensional FDCT function.

H.261 Block Transformation

Each Block (and prediction error) is an 8*8 pixel square
It is coded as a forward discrete cosine transform
If this sounds expensive, there are fast table driven algorithms
Can be done in s/w quite easily, as well as very easily in h/w

H.261 Quantization & Entropy Coding

The purpose of this step is to achieve further compression by representing the DCT coefficients with no greater precision than is necessary to achieve the required quality. The number of quantizers are 1 for the INTRA dc coefficients and 31 for all others.

Entropy coding involves extra compression (non-lossy) is done by assigning shorter code-words to frequent events and longer code-words to less frequent events. Huffman coding is usually used to implement this step.

H.261 Quantization

For a given quality, we can lose coefficients of the transform by using less bits than would be needed for all the values
Leads to a "coarser" picture
Can then entropy code the final set of values by using shorter words for the most common values and longer ones for rarer ones (like using 8 bits for three letter words in English:-)

H.261 Multiplexing

The video multiplexer structures the compressed data into a hierarchical bitstream that can be universally interpreted.

The hierarchy has four layers :

Picture layer : corresponds to one video picture (frame)
Group of blocks: corresponds to 1/12 of CIF pictures or 1/3 of QCIF
Macroblocks : corresponds to 16x16 pixels of luminance and the two spatially corresponding 8x8 chrominance components.
Blocks: corresponds to 8x8 pixels

H.261 Multiplexing.

Bitstream made up of 4 things:

Pictures (A video frame)
Groups of Blocks (1/3 of QCIF picture)
Macroblocks (16*16 luminence and 2 8*8 Chrominence)
Blocks (8*8 pixels)

H.261 Error Correction Framing

An error correction framing structure is described in the H261 standard. The frame structure is shown in the figure. The BCH(511,493) parity is used to protect the bit stream transmitted over ISDN and is optional to the decoder. The fill bit indicator allows data padding thus ensuring the transmission on every valid clock cycle

H.261 Error Correction and Framing

The framing structure for H.261 is H.221, which includes a FEC scheme, as shown in the 3 diagrams below.

Picture: h261-fec.ps

Picture: H221.ps

Picture: H221_framing.ps

H.261 Summary

Though H261 as mentioned before can be considered the most widely video compression standard used in the field of multimedia conferencing, it has its limitations as far as its suitability for transmission over PSDN. H261 does not map naturally onto hierarchical coding. A few suggestions has been made as to how this can happen but as a standard there is no support of that. H261 resolution is fine for conferencing applications. Once more quality critical video data need to be compressed, the upper limit optional CIF resolution can start showing inadequate.

H.261 Summary

H.261 is good for Videotelephony and conferencing
Currently mainly used over ISDN, but could be used over packet nets.
Hierarchical use not part of the standard (yet)
At 2Mbps, it approximates to entertainment quality (VHS) video.

MPEG

The aim of the MPEG-II video compression standard is to cater for the growing need of generic coding methods for moving images for various applications such as digital storage and communication. So unlike the H261 standard who was specifically designed for the compression of moving images for video conferencing systems at p * 64kbps , MPEG is considering a wider scope of applications.

MPEG

Aimed at storage as well as transmission
Higher cost and quality than H.261
Higher minimum bandwidth
Decoder is just about implementable in software
Target 2Mbps to 8Mbps really.
The "CD" of Video?

MPEG SOURCE IMAGES format

The source pictures consist of three rectangular matrices of integers: a luminance matrix (Y) and two chrominance matrices (Cb and Cr).

The MPEG supports three format :

4:2:0 format

In this format the Cb and Cr matrices shall be one half the size of the Y matrix in both horizontal and vertical dimensions.

4:2:2 format

In this format the Cb and Cr matrices shall be one half the size of the Y matrix in horizontal dimension and the same size in the vertical dimension.

4:4:4 format

In this format the Cb and Cr matrices will be of the same size as the Y matrix in both vertical and horizontal dimensions.

MPEG Source Images Format

YUC sampling in 4 forms
4:2:0, 4:2:2, 4:4:4
Looking at some video capture cards (e.g. Intel’s PC one) it may be hard to convert to this
But then this is targeted at digital video tape and video on demand really.

MPEG frames

The output of the decoding process, for interlaced sequences, consists of a series of fields that are separated in time by a field period. The two fields of a frame may be coded independently (field-pictures) or can be coded together as a frame (frame pictures).

MPEG Frames

The diagram shows the intra, predictive and bi-directional frames that MPEG supports:

: mpeg.ps

MPEG source coder

An MPEG source encoder will consist of the following elements:

Prediction (3 frame times)
Block Transformation
Quantization and Variable Length Encoding

MPEG Prediction

MPEG defines three types of pictures:

1. Intrapictures (I-pictures)

These pictures are encoded only with respect to themselves. Here each picture is composed onto blocks of 8x8 pixels each that are encoded only with respect to themselves and are sent directly to the block transformation process.

2. Predictive pictures (P-pictures)

These are pictures encoded using motion compensated prediction from a past I-picture or P-picture. A prediction error is calculated between a 16x16 pixels region (macroblock) in the current picture and the past reference I or P picture. A motion vector is also calculated to determine the value and direction of the prediction. For progressive sequences and interlaced sequences with frame-coding only one motion vector will be calculated for the P-pictures. For interlace sequences with field-coding two motion vectors will be calculated. The prediction error is then composed to 8x8 pixels blocks and sent to the block transformation

3. Bi-directional pictures (B-pictures)

These are pictures encoded using motion compensates predictions from a past and/or future I-picture or P-picture. A prediction error is calculated between a 16x16 pixels region in the current picture and the past as well as future reference I-picture or P-picture. Two motion vectors are calculated. One to determine the value and direction of the forward prediction the other to determine the value and direction of the backward prediction. For field-coding pictures in interlaced sequences four motion vectors will thus be calculated.

It must be noted that a B-picture can never be used as a prediction picture.

The method of calculating the motion vectors as well as the search area for the best predictor is left to be determined by the encoder.

MPEG prediction

I pictures are encoded as intra- w.r.t themselves only
P-pictures are coded w.r.t the last I-Picture (including any motion compensation)
B-Pictures use forward and backward predictions to encode w.r.t other I or P Pictures

MPEG Block Transformation

In block transformation, INTRA coded blocks as well as prediction errors are processed by a two-dimensional DCT function.

Quantization

The purpose of this step is to achieve further compression by representing the DCT coefficients with no greater precision than is necessary to achieve the required quality.

Variable length encoding

Here extra compression (non-lossy) is done by assigning shorter code-words to frequent events and longer code-words to less frequent events. Huffman coding is usually used to implement this step.

MPEG Block Transformation

As with H.261, frames are compressed using discrete cosine transforms
These are (again) quantized and the resulting values Huffman coded
There are, however, a few more things to MPEG

MPEG Multiplexing

The video multiplexer structures the compressed data into a hierarchical bitstream that can be universally interpreted.

The hierarchy has four layers :

Videosequence

This is the highest syntactic structure of the coded bitstream. It can be looked at as a random access unit.

Group of pictures

This is optional in MPEG II. This corresponds to a series of pictures. The first picture in the coded bitstream has to be an I picture. Group of pictures does assist random access. They can also be used at scenes cuts or other cases where motion compensation is ineffective. Applications requiring random access, fast-forwarder fast-reverse playback may use relatively short group of pictures.

Picture

This would correspond to one picture in the video sequence. For field pictures in interlaced sequences, the interlaced picture will be represented by two separate pictures in the coded stream. They will be encoded in the same order that shall occur at the output of the decoder.

Slice

This corresponds to a group of Macroblocks. The actual number of Macroblocks within a slice is not subject to standardization. Slices do not have to cover the whole picture. Its a requirement that if the picture was used subsequently for predictions, then predictions shall only be made from those regions of the picture that were enclosed in slices.

Macroblock

A macro block contains a section of the luminance component and the spatially corresponding chrominance components. A 4:2:0 macroblock consists of 6 blocks (4Y, 1 Cb, 1Cr) A 4:2:2 Macroblock consists of 8 blocks (4Y, 2 Cb, 2 Cr) A4:4:4 Macroblock consists of 12 blocks (4Y,4Cb, 4Cr)

Block

Corresponds to 8x8 pixels.

MPEG multiplexing

The structure of the MPEG bitstream is a tad more complex than that of H.261:

Video Sequence
Group of Pictures
Picture
Slice
Macroblock
Block

MPEG Picture Order

It must be noted that in MPEG the order of the picture in the coded stream is the order in which the decoder process them. The reconstructed frames are not necessarily in the correct form of display. The following example shows such a case

At the encoder input,

12 3 4 5 6 78 9 10 11 12 13

IB B P B B PB B I B B P

At the encoder output, in the coded bitstream and at the decoder input,

14 2 3 7 5 610 8 9 13 11 12

IP B B P B BI B B P B B

At the decoder output:

12 3 4 5 6 78 9 10 11 12 13

MPEG Picture order

The order of pictures at the decoder is not the display order, always

This leads to potential for delays in the encoder/decoder loop

This is also true of H.261 - at its highest compression ratio, it may incur as much as 0.5 seconds delay - not very pleasant for interactive use!

SCALEABLE EXTENSIONS

The scalability tools specified by MPEG II are designed to support applications beyond that supported by single layer video. In a scaleable video coding, it is assumed that given an encoded bitstream, decoders of various complexities can decode and display appropriate reproductions of coded video. The basic scalability tools offered are: data partitioning, SNR scalability, spatial scalability and temporal scalability. Combinations of these basic scalability tools are also supported and are referred to as hybrid scalability. In the case of basic scalability, two layers of video referred to as the lower layer and the enhancement layer are allowed. Whereas in hybrid scalability up to three layers are supported.

MPEG Extensions

Spatial scalable extension

This involves generating two spatial resolution video layers from a single video source such that the lower layer is coded by itself to provide the basic spatial resolution and the enhancement layer employs the spatially interpolated lower layer and carries the full spatial resolution of the input video source.

SNR scalable extension

This involves generating two video layers of same spatial resolution but different video qualities from a single video source. The lower layer is coded by itself to provide the basic video quality and the enhancement layer is coded to enhance the lower layer. The enhancement layer when added back to the lower layer regenerates a higher quality reproduction of the input video.

Temporal scalable extension

This involves generating two video layers whereas the lower one is encoded by itself to provide the basic temporal rate and the enhancement layer is coded with temporal prediction with respect to the lower layer. These layers when decoded and temporally multiplexed yield full temporal resolution of the video source.

Data partitioning extension

This involves the partitioning of the video coded bitstream into two parts. One part will carry the more critical parts of the bitstream such as headers , motion vectors and DC coefficients). The other part will carry less critical data such as the higher DCT coefficients.

Profiles and levels

Profiles and levels provide a means of defining subsets of the syntax and semantics and thereby the decoder capabilities to decode a certain stream. A profile is a defined sub-set of the entire bitstream syntax that is defined by MPEG II. A level is a defined set of constraints imposed on parameters in the bit stream.

MPEG Extensions

Can encode different levels of spatial or temporal quality

Can partition the bitstream appropriately

Can profile an MPEG encoder.

MPEG II Profiles

Five profiles are defined :

Simple
Main
SNR scalable
Spatially scalable
High

Along with four levels

Low
Main
High 1440
High

MPEG Profiles

Important to realize specification is of encoded stream
Leaves lots of options open to the implementor
Profiles allow us to scope these choices (as in other standards, e.g. in telecommuncations)
This is important, as the hard work (expensive end) is the encoder, while the stream as specified, is generally easy however it Is implemented, to decode.
For information, the diagram shows a comparison of the data rate out of an H.261 and an MPEG coder

Picture: h261vmpegsize.ps

MPEG II

MPEG II is now an ISO standard. Due to the forward and backward temporal compression used by MPEG, a better compression and better quality can be produced. As MPEG does not limit the picture resolution, high resolution data can still be compressed using MPEG. The scaleable extensions defined by MPEG can map neatly on the hierarchical scheme explained in 2.1. The out-of- order processing which occurs in both encoding and decoding side can introduce considerable latencies. This is undesirable in video telephony and video conferencing.

Prices for hardware MPEG encoders are quite expensive at the moment though this should change over the near future. The new SunVideo board (see below) does support MPEG I encoding. Software implementation of MPEG I DECoders are already available.

MPEG II

MPEG II now an ISO standard
Slightly better than MPEG I
CODECs very very pricey right now
Software for decoders exists (in the public domain) and performs reasonably well for small pictures.

MPEG III and IV

MPEG III was going to be a higher quality encoding for HDTV. It transpired after some studies that MPEG II at higher rates is pretty good, and so MPEG III has been dropped.

MPEG IV is aimed at the opposite extreme - that of low bandwidth or low storage capacity environments (e.g. PDAs). It is based around model-based image coding schemes (i.e. knowing what is in the picture!). It is aimed at UP TO 64kbps.

MPEG III and IV

MPEG III was going to be High Definition MPEGII

Turns out MPEG II at higher rates is good enough

MPEG IV is for lower rates, such as a few 10s kbps

Subband coding

Subband coding is given as an example of an encoding algorithm that can neatly map onto hierarchical coding. There are other examples of hierarchical encoding none of them is a standard or widely used as the international standards such as H261 and MPEG.

Subband coding is based on the fact that the low spatial frequencies components of a picture do carry most of the information within the picture. The picture can thus be divided into its spatial frequencies components and then the coefficients are quantized describing the image band according to their importance; lower frequencies being more important. The most obvious mapping is to allocate each subband (frequency) to one of the hierarchy layers. If inter-frame coding is used, it has to be adjusted as not to create any upward dependencies.

Subband Coding

Layered or subband coding uses a repeated application of the coder to different spatial frequencies in the picture
Similar to the ideas in H.261 and MPEG but applied more directly
Have to take care with inter-frame coding interactions with a subband coding scheme (areas change in detail...)

DVI

Intel’s Digital Video Interactive compression scheme is based on the region encoding technique. Each picture is divided into regions which in turn is split into subregions and so on, until the regions can be mapped onto basic shapes to fit the required bandwidth and quality. The chosen shapes can be reproduced well at the decoder. The data sent is a description of the region tree and of the shapes at the leaves. This is an asymmetric coding, which requires large amount of processing for the encoding and less for the decoding.

DVI ,though not a standard, started to play an important role in the market. SUN prototype DIME board used DVI compression and it was planned to be incorporated in the new generation of SUN videopix cards.

This turned out to be untrue. Intel canceled the development of the V3 DVI chips. SUN next generation of VideoPix, the SunVideo card does not support DVI. The future of DVI is all in doubt.

DVI

Region based coding scheme
Good compression
No loss tolerance
Chipset was developed by Intel
Not popular anymore

CellB Compression

CellB image compression is introduced by SUN and is supported by its new SunVideo cards. CellB is based on the techniques of block truncation and vector quantization.

In vector quantization, the picture is divided into blocks and the coefficients describing the blocks are used as vectors. As the vector space in which the block vectors exist would not be evenly populated by the blocks, the vector space can be divided into subspaces selected to provide equal probability of a random vector being in any of the subspaces. A prototype vector will be then used to represent all blocks whose vectors fall into a certain subspace.

The most processor intensive part of vector quantization is the generation of the codebook, that is the division of the vector space into subspaces. Then a copy of the codebook is sent to the other end. The image is then divided into blocks which is represented by the vector in the codebook that is closest to it and the label is sent. Decoding is done by looking up the labels in the code book and use the correspondent vector to represent the block.

CellB uses two fixed codebooks. It takes 3-band YUV images as input, the width and height must be dividable by 4. The video is broken into cells of 16 pixels each arranged in 4x4 group. The 16 pixels in a cell are represented by a 16-bit mask and two intensities or colors. These values specify which intensity to place at each of the pixel positions. The mask and intensities

can be chosen to maintain certain statistics of the cell, or they can be chosen to reduce contouring in a manner similar to ordered dither. This method is called Block Truncation Coding. It takes advantage of the primitives already implemented in graphics accelerators to provide video decoding.

CellB

Proprietary Sun Microsystems

Implemented on their video cards

Good loss tolerance

based on vector quantization

See the diagram

Picture:vq-tree.ps

QuickTime and Video for Windows

Apple and Microsoft have both defined standards for their respective systems to accommodate video. However, in both cases, they are more concerned with defining a usable API so that program developers can generate applications that interwork quickly and effectively. Thus, Video for Windows and QuickTIme both specify the ways that video can be displayed and processed within the framework of the GUI systems on MS-Windows and Apple systems. However, neither specifies a specific video encoding. Rather, they assume that all kinds of encodings will be available through hardware CODECs or through software and thus they provide meta-systems that allow the programmer to name the encoding, and provide translations.

Quicktime & Video for Windows

Apple and Microsoft rely on hardware manufacturers for processors

Neither specify a particular video format

Rather, specify a framework for accommodating many video formats

Also specify an API for manipulating and displaying video widgets