www.sigmatrainers.com
TRAINERS
Since 21 Years
VIDEO COMPRESSING TECHNIQUES MODEL-VIDEOCOMP100
More than 1500 Trainers
SIGMA TRAINERS AHMEDABAD (INDIA)
INTRODUCTION This trainer includes theory and soft wares used for different types video compressing techniques.
SPECIFICATIONS
1.
Manual
:
Includes more than 200 pages discussing different types video Compressing Techniques.
2.
Video Compressing formats
:
To compress AVI, MPG1, MPEG-2, WMV. To compress to MPEG using VCD, SVCD, or DVD
3.
Video Compressing soft wares:
1.
Blaze Media Pro software
2.
Alparysoft Lossless Video Codec
3.
MSU Lossless Video Codec
4.
DivX Player with DivX Pro Codec (98/Me)
5.
Elecard MPEG-2 Decoder & Streaming pack
VIDEO COMPRESSING TECHNIQUES- MPEG2 VIDEO COMPRESSION Video compression refers to reducing the quantity of data used to represent video content without excessiv ely reducing the quality of the picture. It also reduces the number of bits required to store and/or transmit digital media. Compressed video can be transmitted more economically over a smaller carrier. Digital video requires high data rates - the better the picture, the more data is ordinarily needed. This means powerful hardware, and lots of bandwidth when video is transmitted. However much of the data in video is not necessary for achieving good perceptual quality, e.g., because it can be easily predicted - for example, successive frames in a movie rarely change much from one to the next - this makes data compression work well with video. Video compression can make video files far smaller with little perceptible loss in quality. For example, DVDs use a video coding standard called MPEG-2 that makes the movie 15 to 30 times smaller while still producing a picture quality that is generally considered high quality for standard-definition video. Without proper use of data compression techniques, either the picture would look much worse, or one would need more su ch disks per movie. Theory Video is basically a three-dimensional array of color pixels. Two dimensions serve as spatial (horizont al and vertical) directions of the moving pictures, and one dimension represents the time domain. A frame is a set of all pixels that correspond to a single point in time. Basically, a frame is the same as a sti ll picture. (These are sometimes made up of fields. See interlace) Video data contains spatial and temporal redundancy. Similarities can thus be encoded by merely r egistering differences within a frame (spatial) and/or between frames (temporal). Spatial encoding is performed by taking advantage of the fact that the human eye is unable to distinguish small differences in colour as easily as it can changes in brightness and so very similar areas of colour can be "averaged out" in a similar way to jpeg images (JPEG image compression FAQ, part 1/2). With temporal compression only the changes from one frame to the next are encoded as often a large number of the pixels will be the same on a series of fr ames (About video compression). Lossless compression
Some forms of data compression are lossless. This means that when the data is decompressed, the result is a bitfor-bit perfect match with the original. While lossless compression of video is possible, it is rarely used. This is because any lossless compression system will sometimes result in a file (or portions of) that is as large and /or has the same data rate as the uncompressed original. As a result, all hardware in a lossless system would have to be able to run fast enough to handle uncompressed video as well. This eliminates much of the benefit of compressing the data in the first place. For example, digital videotape can't vary its data rate easily so deal ing with short bursts of maximum-data-rate video would be more complicated than something that was fixed at the maximum rate all the time. Intraframe vs interframe compression
One of the most powerful techniques for compressing video is interframe compression. This works by comparing each frame in the video with the previous one. If the frame contains areas where nothing has moved, the system simply issues a short command that copies that part of the previous frame, bit-for-bit, into the next one. If objects move in a simple manner, the compressor emits a (slightly longer) command that tells the decompressor to shift, rotate, lighten, or darken the copy -- a longer command, but still much shorter than intraframe compression. Interframe compression is best for finished programs that will simply be played back by the viewer. Interframe compression can cause problems if it is used for editing. Since Interframe compression copies data from one frame to another, if the original frame is simply cut out (or lost in transmission), the following frames cannot be reconstructed. Some video formats, such as DV, compress each frame independently, as if they were all unrelated still images (using image compression techniques). This is called intraframe compression. Editing intraframe-compressed video is almost as easy as editing uncompressed 1
video -- one finds the beginning and ending of each frame, and simply copies bit-for-bit each frame that one wants to keep, and discards the frames one doesn't want. Another difference between intraframe and interframe compression is that with intraframe systems, each frame uses a similar amount of data. In interframe systems , certain frames called "I frames" aren't allowed to copy data from other frames, and so require much more dat a than other frames nearby. (The "I" stands for independent.) It is possible to build a computer-based video editor that spots problems caused when I frames are edit ed out while other frames need them. This has allowed newer formats like HDV to be used for editing. However, this process demands a lot more computing power than editing intraframe compressed video with the same pictur e quality.
MPEG (MOVING PICTURES EXPERTS GROUP ) It is a set of standards established established for the compression of digital video and audio data. It is the universal standard for digital terrestrial, cable and satellite TV, DVDs and digital video recorder. MPEG uses lossy compression within each frame similar to JPEG, which means pixels from the original images are permanently discarded. It also uses interframe coding, which further compresses the data by encoding only the differences between periodic frames (see interframe coding). MPEG performs the actual compression using the discrete cosine transform (DCT) method (see DCT). MPEG is an asymmetrical system. It takes longer to compress the video than it does to decompress it in the DVD player, PC, set-top box or digital TV set. As a result, in the early days, compression was perfomed only in th e studio. As chips advanced and became less costly, they enabled digital video recorders, such as Tivos, to convert analog TV to MPEG and record it on disk in realtime (see DVR). MPEG-1 (Video CDs) Although MPEG-1 supports higher resolutions, it is typically coded at 352x240 x 30fps (NTSC) or 352x288 x 25fps (PAL/SECAM). (PAL/SECAM). Full 704x480 and 704x576 frames (BT.601) were scaled down for encoding and scaled up for playback. MPEG-1 uses the YCbCr color space with 4:2:0 sampling, but did not provide a standard way of handling interlaced video. Data rates were limited to 1.8 Mbps, but often exceeded. See YCbCr sampling. MPEG-2 (DVD, Digital TV) MPEG-2 provides broadcast quality video with resolutions up to 1920x1080. It supports a variety of audio/video formats, including legacy TV, HDTV and five channel surround sound. MPEG-2 uses the YCbCr color space with 4:2:0, 4:2:2 and 4:4:4 sampling and supports interlaced video. Data rates are from 1.5 to 60 Mbps. See YCbCr sampling. MPEG-4 (All Inclusive and Interactive) MPEG-4 is an extremely comprehensive system for multimedia representation and distribution. Based on a variation of Apple's QuickTime file format, MPEG-4 offers a variety of compression options, including low bandwidth formats for transmitting to wireless devices as well as high-bandwidth for studio processing. See H.264. MPEG-4 also incorporates AAC, which is a high-quality audio encoder. MPEG-4 AAC is widely used as an audio-only format (see AAC). A major feature of MPEG-4 is its ability to identify and deal with separate audio and video objects in the frame, which allows separate elements to be compressed more efficiently and dealt with independently. User-controlled interactive sequences that include audio, video, text, 2D and 3D objects and animations are all part of the MPEG4 framework. For more information, visit the MPEG Industry Forum at www.mpegif.org. MPEG-7 (Meta-Data) MPEG-7 is about describing multimedia multimedia objects and has nothing to do with compression. It provides a library of core description tools and an XML-based Description Definition Language (DDL) for extending the library with 2
additional multimedia objects. Color, texture, shape and motion are examples of characteristics defined by MPEG-7. MPEG-21 (Digital Rights Infrastructure) MPEG-21 provides a comprehensive framework for storing, searching, accessing and protecting the copyrights of multimedia assets. It was designed to provide a standard for digital rights management as well as interoperability. MPEG-21 uses the "Digital Item" as a descriptor for all multimedia objects. Like MPEG-7, it does not deal with compression methods. The Missing Numbers MPEG-3 was abandoned after initial development because MPEG-2 was considered sufficient. Because MPEG-7 does not deal with compression, it was felt a higher number was needed to distance it from MPEG-4. MPEG-21 was coined for the 21st century. MPEG Vs. Motion JPEG Before MPEG, a variety of non-standard Motion JPEG (M-JPEG) methods were used to create consecutive JPEG frames. Motion JPEG did not use interframe coding between frames and was easy to edit, but not as highly compressed as MPEG. For compatibility, video editors may support one of the Motion JPEG methods. MPEG can also be encoded without interframe compression for faster editing. See MP3, MPEG LA, MPEGIF, MPEG-2 MPEG-2 is a standard for "the generic coding of moving pictures and associated audio information [1]." It is widely used around the world to specify the format of the digital television signals that are broadcast by terrestrial (over-the-air), cable, and direct broadcast satellite TV systems. It also specifies the format of movies and other programs that are distributed on DVD and similar disks. The standard allows text and other data, e.g., a program guide for TV viewers, to be added to the video and audio data streams. TV stations, TV receivers, DVD players, and other equipment are all designed to this standard. MPEG-2 was the second of several standards developed by the Motion Pictures Expert Group (MPEG) and is an international standard (ISO/IEC 13818). While MPEG-2 is the core of most digital television and DVD formats, it does not completely specify them. Regional institutions adapt it to their needs by restricting and augmenting aspects of the standard. See "#Profiles and Levels." MPEG-2 includes a Systems part (part 1) that defines two distinct (but related) container formats. One is Transport Stream, which is designed to carry digital video and audio over somewhat-unreliable media. MPEG-2 Transport Stream is commonly used in broadcast applications, such as ATSC and DVB. MPEG-2 Systems also defines Program Stream, a container format that is designed for reasonably reliable media such as disks. MPEG-2 Program Stream is used in the DVD and SVCD standards. The Video part (part 2) of MPEG-2 is similar to MPEG-1, but also provides support for interlaced video (the format used by analog broadcast TV systems). MPEG-2 video is not optimized for low bit-rates (less than 1 Mbit/s), but outperforms MPEG-1 at 3 Mbit/s and above. All standards-conforming MPEG-2 Video decoders are fully capable of playing back MPEG-1 Video streams. With some enhancements, MPEG-2 Video and Systems are also used in most HDTV transmission systems. The MPEG-2 Audio part (defined in Part 3 of the standard) enhances MPEG-1's audio by allowing the coding of audio programs with more than two channels. Part 3 of the standard allows this to be done in a backwards compatible way, allowing MPEG-1 audio decoders to decode the two main stereo components of the presentation.
3
Part 7 of the MPEG-2 standard specifies a rather different, non-backwards-compatible audio format. Part 7 is referred to as MPEG-2 AAC. While AAC is more efficient than the previous MPEG audio standards, it is much more complex to implement, and somewhat more powerful hardware is needed for encoding and decoding.
Video coding (simplified) An HDTV camera generates a raw video stream of more than one billion bits per second. This stream must be compressed if digital TV is to fit in the bandwidth of available TV channels and if movies are to fit on DVDs. Fortunately, video compression is practical because the data in pictures is often redundant in space and time. For example, the sky can be blue across the top of a picture and that blue sky can persist for frame after frame. Also, because of the way the eye works, it is possible to delete some data from video pictures with almost no noticeable degradation in image quality. TV cameras used in broadcasting usually generate 50 pictures a second (in Europe and elsewhere) or 59.94 pictures a second (in North America and elsewhere). Digital television requires that these pictures be digitized so that they can processed by computer hardware. Each picture element (a pixel) is then represented by one luminance number and two chrominance numbers. These describe the brightness and the color of the pixel (see YUV). Thus, each digitized picture is initially represented by three rectangular arrays of numbers. A common (and old) trick to reduce the amount of data that must be processed per second is to separate the picture into two fields: the "top field," which is the odd numbered rows, and the "bottom field," which is the even numbered rows. The two fields are displayed alternately. This is called interlaced video. Two successive fields are called a frame. The typical frame rate is then 25 or 29.97 frames a second. If the video is not interlaced, then it is called progressive video and each picture is a frame. MPEG-2 supports both options. Another trick to reduce the data rate is to thin out the two chrominance matrices. In effect, the remaining chrominance values represent the nearby values that are deleted. Thinning works because the eye is more responsive to brightness than to color. The 4:2:2 chrominance format indicates that half the chrominance values have been deleted. The 4:2:0 chrominance format indicates that three quarters of the chrominance values have been deleted. If no chrominance values have been deleted, the chrominance format is 4:4:4. MPEG-2 allows all three options. MPEG-2 specifies that the raw frames be compressed into three kinds of frames: I(ntra-coded)-frames, P(redictive-coded)-frames, and B(idirectionally predictive-coded)-frames. An I-frame is a compressed version of a single uncompressed (raw) frame. It takes advantage of spatial redundancy and of the inability of the eye to detect certain changes in the image. Unlike P-frames and B-frames, I-frames do not depend on data in the preceding or the following frames. Briefly, the raw frame is divided into 8 pixel by 8 pixel blocks. The data in each block is transformed by a "discrete cosine transform." The result is a 8 by 8 matrix of coefficients. This transform does not change the information in the block; the original block can be recreated exactly by applying the inverse cosine transform. The math is a little esoteric b ut, roughly, the transform converts spatial variations into frequency variations. The advantage of doing this is that the image can now be simplified by quantizing the coefficients. Many of the coefficients, usuall y the higher frequency components, will then be zero. The penalty of this step is the loss of some subtle distinctions in brightness and color. If one applies the inverse transform to the matrix after it is quantized, one gets an image that looks ver y similar to the original image but that is not quite as nuanced. Next, the quantized coefficient ma trix is itself compressed. Typically, one corner of the quantized matrix is filled with zeros. By starting in the opposite corner of the matrix, then zigzaging through the matrix to combine the coefficients into a string, then substituting runlength codes for consecutive zeros in that string, and then applying Huffman coding to that result, one reduces the matrix to a smaller array of numbers. It is this array that is broadcast or that is put on DVDs. In the receiver or the player, the whole process is reversed, enabling the receiver to reconstruct, to a close approximation, the original frame. Typically, every 15th frame or so is made into an I-frame. P-frames and B-frames might follow an I-frame like this, IBBPBBPBBPBB(I), to form a Group of Pictures (GOP); however, the standard is flexible about this. 4
P-frames provide more compression than I-frames because they take advantage of the data in the previous I-frame or P-frame. I-frames and P-frames are called reference frames. To generate a P-frame, the previous re ference frame is reconstructed, just as it would be in a TV receiver or DVD player. The frame being compressed is divided into 16 pixel by 16 pixel "macroblocks." Then, for each of those macrob locks, the reconstructed reference frame is searched to find that 16 by 16 macroblock that best matches the macroblock being compressed. The offset is encoded as a "motion vector." Frequently, the offset is zero. But, if something in th e picture is moving, the offset might be something like 23 pixels to the right and 4 pixels up. The match between t he two macroblocks will often not be perfect. To correct for this, the encoder computes the strings of coefficient values as described above for both macroblocks and, then, subtracts one from the other. This "residual" is appended to the motion vector and the result sent to the receiver or stored on the DVD for each macroblock being compressed. Sometimes no suitable match is found. Then, the macroblock is treated like an I-frame macroblock. The processing of B-frames is similar to that of P-frames except that B-frames use the picture in the following reference frame as well as the picture in the preceding reference frame. As a result, B-frames usually provide more compression than P-frames. B-frames are never reference frames. While the above paragraphs generally describe MPEG-2 video compression, there are many details that are not discussed including details involving fields, chrominance formats, responses to scene changes, special codes that label the parts of the bitstream, and so on. MPEG-2 compression is complicated. TV cameras capture pictures at a regular rate. TV receivers display pictures at a regular rate. In between, all kinds of things are happening. But it works.
Audio encoding
MPEG-2 also introduces new audio encoding methods. These are • • •
low bitrate encoding with halved sampling rate (MPEG-1 Layer 1/2/3 LSF) multichannel encoding with up to 5.1 channels MPEG-2 AAC
Profiles and Levels MPEG-2 Profiles Abbr.
Name
Frames YUV Streams
SP
Simple Profile P, I
4:2:0 1
MP
Main Profile
P, I, B
4:2:0 1
422P 4:2:2 Profile
P, I, B
4:2:2 1
SNR SNR Profile
P, I, B
4:2:0 1-2
SP
Spatial Profile P, I, B
4:2:0 1-3
HP
High Profile
4:2:2 1-3
P, I, B
Comment
no interlacing
SNR: Signal to Noise Ratio low, normal and high quality decoding
MPEG-2 Levels Abbr.
Name
Pixel/line Lines Framerate (Hz) Bitrate (Mbit/s)
LL
Low Level 352
28 8
30
4
ML
Main Level 720
57 6
30
15
1440
1152 30
60
High Level 1920
1152 30
80
H-14 High 1440 HL
Profile @ Level
Resolution (px)
Framerate max. (Hz)
Sampling
Bitrate (Mbit/s)
SP@LL
176 × 144
15
4:2:0
0.096
SP@ML
352 × 288
15
4:2:0
0.384 5
Example Application
Wireless handsets PDAs
MP@LL MP@ML
MP@H-14
MP@HL
320 × 240
24
352 × 288
30
720 × 480
30
720 × 576
25
1440 × 1080
30
1280 × 720
30
1920 × 1080
30
1280 × 720
60
422P@H-14
422P@HL
4
Set-top boxes (STB)
4:2:0
15 (DVD: 9.8)
DVD, SD-DVB
4:2:0
60 (HDV: 25)
HDV
4:2:0
80
ATSC 1080i, 720p60, HD-DVB (HDTV)
4:2:2
50
Sony IMX using I-frame only, Broadcast "contribution" video (I&P only)
4:2:2
80
Potential future MPEG-2-based HD products from Sony and Panasonic
4:2:2
300
Potential future MPEG-2-based HD products from Panasonic
4:2:2
422P@LL 422P@ML
4:2:0
720 × 480
30
720 × 576
25
1440 × 1080
30
1280 × 720
60
1920 × 1080
30
1280 × 720
60
DVD
The DVD standard uses MPEG-2 video, but imposes some restrictions: •
•
•
Allowed Resolutions 720 × 480, 704 × 480, 352 × 480, 352 × 240 pixel (NTSC) o 720 × 576, 704 × 576, 352 × 576, 352 × 288 pixel (PAL) o Allowed Aspect ratio (image) (Display AR) 4:3 o 16:9 o (2.21:1 is often listed as a valid DVD aspect ratio, but is actually just a 16:9 image with the top o and bottom of the frame masked in black) Allowed Frame rates 29.97 frame/s (NTSC) o 25 frame/s (PAL) o Note: By using a pattern of REPEAT_FIRST_FIELD flags on the headers of encoded pictures, pictures can be displayed for either two or three fields and almost any picture display rate (minimum ⅔ of the frame rate) can be achieved. This is most often used to display 23.976 (approximately film rate) video on NTSC.
•
• • • •
Audio+video bitrate o Video peak 9.8 Mbit/s Total peak 10.08 Mbit/s o Minimum 300 Kbit/s o YUV 4:2:0 Additional subtitles possible Closed captioning (NTSC only) Audio Linear Pulse Code Modulation (LPCM): 48 kHz or 96 kHz; 16- or 24-bit; up to six channels (not o all combinations possible due to bitrate constraints) MPEG Layer 2 (MP2): 48 kHz, up to 5.1 channels (required in PAL players only) o Dolby Digital (DD, also known as AC-3): 48 kHz, 32–448 kbit/s, up to 5.1 channels o Digital Theater Systems (DTS): 754 kbit/s or 1510 kbit/s (not required for DVD player o compliance) 6
NTSC DVDs must contain at least one LPCM or Dolby Digital audio track. PAL DVDs must contain at least one MPEG Layer 2, LPCM, or Dolby Digital audio track. o Players are not required to playback audio with more than two channels, but must be able to o downmix multichannel audio to two channels. GOP structure Sequence header must be present at the beginning of every GOP o Maximum frames per GOP: 18 (NTSC) / 15 (PAL), i.e. 0.6 seconds both o Closed GOP required for multiple-angle DVDs o o
•
DVB
Application-specific restrictions on MPEG-2 video in the DVB standard: Allowed resolutions for SDTV: • • • •
720, 640, 544, 480 or 352 × 480 pixel, 24/1.001, 24, 30/1.001 or 30 frame/s 352 × 240 pixel, 24/1.001, 24, 30/1.001 or 30 frame/s 720, 704, 544, 480 or 352 × 576 pixel, 25 frame/s 352 × 288 pixel, 25 frame/s
For HDTV: • • • • •
720 x 576 x 50 frames/s progressive (576p50) 1280 x 720 x 25 or 50 frames/s progressive (720p50) 1440 or 1920 x 1080 x 25 frames/s progressive (1080p25 - film mode) 1440 or 1920 x 1080 x 25 frames/s interlace (1080i25) 1920 x 1080 x 50 frames/s progressive (1080p50) possible future H.264/AVC format
ATSC
Allowed resolutions: • • • •
1920 × 1080 pixel, 30 frame/s (1080i) 1280 × 720 pixel, 60 frame/s (720p) 720 × 576 pixel, 25 frame/s (576i, 576p) 720 or 640 × 480 pixel, 30 frame/s (480i, 480p)
Note: 1080i is encoded with 1920 × 1088 pixel frames, but the last 8 lines are discarded prior to display. ISO/IEC 13818 Part 1 Systems - describes synchronization and multiplexing of video and audio. Part 2 Video - compression codec for interlaced and non-interlaced video signals. Part 3 Audio - compression codec for perceptual coding of audio signals. A multichannel-enabled extension of MPEG-1 audio. Part 4 Describes procedures for testing compliance. Part 5 Describes systems for Software simulation. Part 6 Describes extensions for DSM-CC (Digital Storage Media Command and Control.) Part 7 7
Advanced Audio Coding (AAC) Part 9 Extension for real time interfaces. Part 10 Conformance extensions for DSM-CC.
(Part 8: 10-bit video extension. Primary application was studio video. Part 8 has been withdrawn due to lack of interest by industry). Current forms Today, nearly all video compression methods in common use (e.g., those in standards approved by the ITU-T or ISO) apply a discrete cosine transform (DCT) for spatial redundancy reduction. Other methods, such as fractal compression, matching pursuits, and the use of a discrete wavelet transform (DWT) have been the sub ject of some research, but are typically not used in practical products (except for the use of wavelet coding as stillimage coders without motion compensation). Interest in fractal compression seems to be waning, due to recent theoretical analysis showing a comparative lack of effectiveness to such methods. The use of most video compression techniques (e.g., DCT or DWT based techniques) involves quantization. The quantization can either be scalar quantization or vector quantization; however, nearly all practical designs use scalar quantization because of its greater simplicity. In broadcast engineering, digital television (DVB, ATSC and ISDB ) is made practical by video compression. TV stations can broadcast not only HDTV, but multiple virtual channels on the same physical channel as well. It also conserves precious bandwidth on the radio spectrum. Nearly all digital video broadcast today uses the MPEG-2 standard video compression format, although H.264/MPEG-4 AVC and VC-1 are emerging contenders in tha t domain.
Multimedia compression formats Video compression formats
Audio compression formats
Image compression formats
Media container formats
AVS | Dirac | Indeo | MJPEG | RealVideo | Others VC-1 | Theora | VP6 | VP7 | WMV
ISO/IEC
MPEG-1 | MPEG-2 | MPEG-4 | MPEG4/AVC
ITU-T
ISO/IEC MPEG
MPEG-1 Layer III (MP3) | MPEG-1 Layer II | AAC | HEAAC
G.711 | G.722 | AC3 | ATRAC | FLAC G.722.1 | G.722.2 | iLBC | Monkey's | G.723 | G.723.1 | Audio | Musepack | ITU-T Others G.726 | G.728 | RealAudio | SHN | G.729 | G.729.1 | Speex | Vorbis | G.729a WavPack | WMA
JPEG | JPEG 2000 | ISO/IEC/ITUJPEG-LS | JBIG | T JBIG2
General
3GP | ASF | AVI | FLV | Matroska | MP4 | MXF | NUT | Ogg | Ogg Media | QuickTime | RealMedia
--
H.261 | H.262 | H.263 | H.264
--
BMP | GIF | ILBM | Others PCX | PNG | TGA | TIFF | WMP
Audio AIFF | AU | WAV -only
8
--
Digital Compression An uncompressed SDI signal outputs 270Mb of data every second. In digital broadcasting compression is essential to squeeze all this data into a 10MHz RF channel. Many people mistakenly equate the term 'bit rate' with picture quality. 'Bit Rate' actually refers to how the signal is processed. Thanks to the unique modular design of all Gigawave digital microwave links the 'plug-in' encoder and modulator modules can easily be changed on-site, or upgraded as new compression techniques evolve.
Compression Techniques used in Telecommunications and Broadcasting: Standard
Bit Rate (Mb/s)
Delay
ETSI 140
140
0
ETSI 34
34
Negligible
ETSI 17
17
ETSI 8
8
DigiBeta
120 (Approx.)
Digital S
50
MPEG 1
1.5
MPEG 2
1.5 - 80
Beta SX
18
EB U
24
News
8
Negligible
2 - 24 frames
MPEG 4
N/ A
Motion JPEG
30 - 100
JPEG 2000
N/ A
DVC Pro 25/50/100
25/50/100
3 frames
DVCam
25
3 frames
DV
25
3 frames
Wavelets
18 - 100
<1ms
Firewire (IEEE 1394)
100/200/400
3 frames
Typical Compression Techniques used in IT: Standard
Bit Rate (Mb/s)
Media 9
N/ A
Ethernet
10, 100, 1000
SCSI
40
SCSII
160
MPEG 4
N/ A
Delay
9
AUDIO COMPRESSION TECHNIQUES Many different compression techniques exist for for various forms of data. Video compression is simpler because many pixels are repeated in groups. Different technique s for still pictures include horizontal repeated pixel compression (pcx format), data conversion (gif format), and fractal path repeated pixels. For motion video, compression is relatively easy because large portions of the screen don't change between each frame; therefore, only the changes between images need to be stored. Text compression is extremely simple compared to video and audio. One method counts the probability of each character and then reassigns smaller bit values to the most common characters and larger bit values to the least common characters. However, digital samples of audio data have proven to be very difficult to compress; these techniques do not work well at all for audio data. The data change often, and no values are common enough to save sufficient space. Currently, five methods are used to compress audio data with varying degrees of complexity, compressed audio quality, and amount of data compression.
Sampling Basics The digital representation of audio data offers many advantages : high noise immu nity, stability, and reproducib ility. Audio in digital form also allows for efficient implementation of many audio processin g functions through the computer. Converting audio from analog to digital begins by sampling the audio input at regular, discrete intervals of time and quantizing the sampled values into a discrete number of evenly spaced levels. According to the Nyquist theory, a time-sampled signal can faithfully represent a signal up to half the sampling rate. Above that threshold, frequencies become blurred and signal noise becomes readily apparent. The sampling frequencies in use today range from 8 kHz for basic speech to 48 kHz for commercial DAT machines. The number of quantizer levels is typically a power of 2 to make full use of a fixed number of bits per audio sample. The typical range for bits per sample is between 8 and 16 bits. This allows for a range of 256 to 65,536 levels of quantization per sample. With each additional bit of quantizer spacing, the signal to noise ratio increases by roughly 6 decibels (dB). Thus, the dynamic range capability of these representations is from 48 to 96 dB, respectively. The data rates associated with uncompressed digital audio are substantial. For audio data on a CD, for example, which is sampled at 44.1 kHz with 16 bits per channel for two channels, about 1.4 megabits per second are processed. A clear need exists for some form of compression to enable the more efficient storage and transmission of digital audio data.
Voc File Compression
The simplest compression techniques simply removed any silence from the entire sample. Creative Labs introduced this form of compression with their introduction of the Soundblaste r line of sound cards. This method analyzes the whole sample and then codes the silence into the sample using byte codes. It is very similar to runlength coding. Linear Predictive Coding and Code Excited Linear Predictor This was an early development in audio compression that was used primarily for speech. A Linear Predictive Coding (LPC) encoder compares speech to an analytical model of the vocal tract, then throws away the speech and stores the parameters of the best-fit model. The output quality was poor and was often compared to computer speech and thus is not used much today.
10
A later development, Code Excited Linear Predictor (CELP), increased the complexity of the speech model further, while allowing for greater compression due to faster computers, and produced much b etter results. Sound quality improved, while the compression ratio increased. The algorithm compares speech with an analytical model of the vocal tract and computes the errors between the original speech and the model. It transmits both model parameters and a very compressed representation of the errors.
Mu-law and A-law compression Logarithmic compressio n is a good method because it matches the way the human ear works. It only loses information which the ear would not hear anyway, and gives good quality results for both speech and mus ic. Although the compression ratio is not very high it requires very little processing power to achiev e. It is the internationa l standard telephony encoding format, also known as ITU (formerly CCITT) standard. It is commonly used in North America and Japan for ISDN 8 kHz sampled, voice grade, digital telephone servi ce. It packs each 16-bit sample into 8 bits by using a logarithmic table to encode a 13-b it dynamic range, dropping the least significant 3 bits of precision. The quantization levels are dispersed unevely instead of linearly to mimic the way that the human ear perceives sound levels differently at different frequencies. Unlike linear quantization, the logarithmic step spacings represent low-amplitude samples with greater accuracy than higher-amplitud e samples. This method is fast and compresses data into half the size of the original sample. This method is used quite widely due to the universal nature of its adoption.
Adaptive Differential Pulse Code Modulation (ADPCM) The Interactive Multimedia Association (IMA) is a consortium of computer hardware and software vendors cooperatin g to develop a standard for multimedia data. Their goal was to select a public-domain audio compression algorithm that is able to provide a good compression ratio while maintaining good audio quality. In addition, the coding had to be simple enough to enable software-only decoding of 44.1 kHz samples on a 20 MHz, 386-class computer. This process is a simple conversion based on the assumption that the changes between samples will not be very large. The first sample value is stored in its entirety, and the each successive value describes the amount +/- 8 levels that the wave will change, which uses only 4 instead of 16 bits. Therefore, a 4:1 compression ratio is achieved with less loss as the sampling frequency increases. At 44.1 kHz, the compressed signal is an accurate representati on of the uncompresse d sample that is difficult to discern from the original. This method is used widely today because of its simplicity, wide acceptance, and high level of compression.
MPEG The Motion Picture Experts Group (MPEG) audio compression algorithm is an International Organization for Standardizat ion (ISO) standard for high fidelity audio compressions. It is one of a three-part compression standard, the other two being video and system. The MPEG compression is lossy, but nonetheless can achieve transparent, perceptually lossless compression. MPEG compression is firmly founded in psychoaccoustic theory. The premise behind this technique is simply: if the sound cannot be heard by the listener, then it does not need to be coded. Human hearing is quite sensitive, but discerning differences in a collage of sounds is quite difficult. Masking is the phenomenon where a strong signal "covers" the sound of another signal such that the softer one cannot be heard by the human ear. An extension of this is temporal masking, which describes masking of a soft sound after loud has stopped. The time, measured under scientific conditions, that it takes to hear the softer sound is about 5 ms. Because the sensitivity of the ear is not linear but is instead dependent upon the frequency, masking effects differ depending on the frequency of the sounds. MPEG compression uses masking as the basis for compressing the audio data. Those sounds that cannot be heard by the human ear do not need to be encoded. The audio spectrum is divided into 32 frequency bands because sound masking occurs over a range of frequencies for each loud sound. Then the volume levels are measured in each band to detect for any masking. Masking effects are taken into account, and the signal is then encoded. 11
In addition to encoding a single signal, the MPEG compression supports one or two audio channels in one of four modes: 1) 2) 3) 4)
Monophonic Dual Monophonic -- two independent channels Stereo -- for stereo channels that share bits, but not using joint-stereo coding Joint - stereo -- takes advantage of the correlations between stereo channels
The MPEG method allows for a compression ratio of up to 6:1. Under optimal listening conditions, expert listeners could not distinguish the coded and original audio clips. Thus, although this technique is lossy, it still produces accurate representations of the original audio signal.
SPEECH COMPRESSION I. Introduction The compression of speech signals has many practical applications. One example is in digital cellular technology where many users share the same frequency bandwidth. Compression allows more users to share the system than otherwise possible. Another example is in digital voice storage (e.g. answering machines). For a given memory size, compression allows longer messages to be stored than otherwise. Historically, digital speech signals are sampled at a rate of 8000 samples/sec. Typically, each sample is represented by 8 bits (using mu-law). This corresponds to an uncompressed rate of 64 kbps (kbits/se c). With current compression techniques (all of which are lossy), it is possible to reduce the rate to 8 kbps with almost no perceptible loss in quality. Further compression is possible at a cost of lower quality. All of the current low-rate speech coders are based on the principle of linear predictive coding (LPC) which is presented in the followi ng sections.
II. LPC Modeling
A. Physical Model:
12
When you speak: • •
•
• • • •
Air is pushed from your lung through your vocal tract and out of your mouth comes speech. For certain voiced sound, your vocal cords vibrate (open and close). The rate at which the vocal cords vibrate determines the pitch of your voice. Women and young children tend to have high pitch (fast vibration) while adult males tend to have low pitch (slow vibration). For certain fricatives and plosive (or unvoiced) sound, your vocal cords do not vibrate but remain constantly opened. The shape of your vocal tract determines the sound that you make. As you speak, your vocal tract changes its shape producing different sound. The shape of the vocal tract changes relatively slowly (on the scale of 10 msec to 100 msec). The amount of air coming from your lung determines the loudness of your voice.
B. Mathematical Model:
• •
•
The above model is often called the LPC Model. The model says that the digital speech signal is the output of a digital filter (called the LPC filter) whose input is either a train of impulses or a white noise sequence. Th e relationship between the physical and the mathematical models:
Vocal Tract Ai r
(Innovations)
Vocal Cord Vibration
(voiced)
Vocal Cord Vibration Period
(pitch period)
Fricatives and Plosives Air Volume •
(LPC Filter)
(unvoiced) (gain)
The LPC filter is given by:
13
which is equivalent to saying that the input-output relationship of the filter is given by the linear difference equation:
•
•
• •
The LPC model can be represented in vector form as:
changes every 20 msec or so. At a sampling rate of 8000 samples/sec, 20 msec is equivalent to 160 samples. The digital speech signal is divided into frames of size 20 msec. There are 50 frames/second. The model says that
is equivalent to
Thus the 160 values of
•
is compactly represented by the 13 values of
There's almost no perceptual difference in if: For Voiced Sounds (V): the impulse train is shifted (insensitive to phase change). o For Unvoiced Sounds (UV): } a different white noise sequence is used. o
•
LPC Synthesis: Given
, generate
•
LPC Analysis: Given
, find the best
(this is done using standard filtering techniques). (this is described in the next section).
III. LPC Analysis •
.
Consider one frame of speech signal:
14
•
The signal
is related to the innovation
through the linear difference equation:
•
The ten LPC parameters
•
Using standard calculus, we take the derivative of
•
We now have 10 linear equations with 10 unknowns:
are chosen to minimize the energy of the innovation :
with respect to
where
•
•
The above matrix equation could be solved using: The Gaussian elimination method. o Any matrix inversion method (MATLAB). o The Levinson-Durbin recursion (described below). o
Levinson-Durbin Recursion:
15
and set it to zero:
Solve the above for
, and then set
•
To get the other three parameters:
, we solve for the innovation:
•
Then calculate the autocorrela tion of
•
Then make a decision based on the autocorrelation:
:
16
IV. 2.4kbps LPC Vocoder •
The following is a block diagram of a 2.4 kbps LPC Vocoder:
•
The LPC coefficients are represented as line spectrum pair (LSP) parameters. LSP are mathematically equivalent (one-to-one) to LPC. LSP are more amenable to quantization. LSP are calculated as follows:
•
Factoring the above equations, we get:
• • •
are called the LSP parameters. •
• •
LSP are ordered an d bounded :
LSP are more correlated from one frame to the next than LPC. The frame size is 20 msec. There are 50 frames/sec. 2400 bps is equivalent to 48 bits/frame. These bits are allocated as follows:
17
•
The 34 bits for the LSP are allocated as follows:
•
The gain,
•
For voiced speech, values of
, is encoded using a 7-bit non-uniform scalar quantizer (a 1-dimensional vector quantizer). ranges from 20 to 146.
V. 4.8 kbps CELP Coder • •
CELP=Code-Excited Linear Prediction. The principle is similar to the LPC Vocoder except : Frame size is 30 msec (240 samples) o is coded directly More bits are need o Computationally more complex o A pitch prediction filter is included o Vector quantization concept is used o A block diagram of the CELP encoder is shown below: o
•
18
are jointly encoded as follows:
•
The pitch prediction filter is given by:
where •
The perceptual weighting filter is given by:
where • • • •
could be an integer or a fraction thereof.
have been determined to be good choices.
Each frame is divided into 4 subframes. In each subframe, the codebook contains 512 codevectors. The gain is quantized using 5 bits per subframe. The LSP parameters are quantized using 34 bits similar to the LPC Vocoder. At 30 msec per frame, 4.8 kbps is equivalent to 144 bits/frame. These 144 bits are allocated as follows:
19
VI. 8.0 kbps CS-ACELP CS-ACELP=Conjugate-Structured Algebraic CELP. •
•
The principle is similar to the 4.8 kbps CELP Coder except : Frame size is 10 msec (80 samples) o There are only two subframes, each of which is 5 msec (40 samples) o The LSP parameters are encoded using two-stage vector quantization. o The gains are also encoded using vector quantization. o At 10 msec per frame, 8 kbps is equivalent to 80 bits/frame. These 80 bits are allocated as follows:
VII. Demonstration This is a demonstration of five different speech compression algorithms (ADPCM, LD-CELP, CS-ACELP, CELP, and LPC10). To use this demo, you need a Sun Audio (.au) Player. To distinguish subtle differences in the speech files, high-quality speakers and/or headphones are recommended. Also, it is recommended that you run this demo in a quiet room (with a low level of background noise).
"A lathe is a big tool. Grab every dish of sugar."
•
•
•
•
•
•
Original (64000 bps) This is the original speech signal sampled at 8000 samples/second and u-law quantized at 8 bits/sample. Approximately 4 seconds of speech.
ADPCM (32000 bps) This is speech compressed using the Adaptive Differential Pulse Coded Modulation (ADPCM) scheme. The bit rate is 4 bits/sample (compression ratio of 2:1).
LD-CELP (16000 bps) This is speech compressed using the Low-Delay Code Excited Linear Prediction (LD-CELP) scheme. The bit rate is 2 bits/sample (compression ratio of 4:1).
CS-ACELP (8000 bps) This is speech compressed using the Conjugate-Structured Algebraic Code Excited Linear Prediction (CS-ACELP) scheme. The bit rate is 1 bit/sample (compression ratio of 8:1).
CELP (4800 bps) This is speech compressed using the Code Excited Linear Prediction (CELP) scheme. The bit rate is 0.6 bits/sample (compression ratio of 13.3:1).
LPC10 (2400 bps) This is speech compressed using the Linear Predictive Coding (LPC10) scheme. The bit rate is 0.3 bits/sample (compression ratio of 26.6:1).
20
IMAGE COMPRESSING TECHNIQUES – JPEG JPEG Compression One of the hottest topics in image compression technology today is JPEG. The acronym JPEG stands for the Joint Photographic Experts Group, a standards committee that had its origins within the International Standard Organization (ISO). In 1982, the ISO formed the Photographic Experts Group (PEG) to research me thods of transmitting video, still images, and text over ISDN (Integrated Services Digital Network) lines. PEG's goal was to produce a set of industry standards for the transmission of graphics and image data over digital communications networks. In 1986, a subgroup of the CCITT began to research methods of compressing color and gray- scale data for facsimile transmission. The compression methods needed for color facsimile systems were very similar to those being researched by PEG. It was therefore agreed that the two groups should combine their resources and work together toward a single standard. In 1987, the ISO and CCITT combined their two groups into a joint committee that would research and produce a single standard of image data compression for both organizations to use. This new committee was JPEG. Although the creators of JPEG might have envisioned a multitude of commercial applications for JPEG technology, a consumer public made hungry by the marketing promises of imaging and multimedia technology are benefiting greatly as well. Most previously developed compression methods do a relatively poor job of compressing continuous-tone image data; that is, images containing hundreds or thousands of colors taken from real-world subjects. And very few file formats can support 24-bit raster images. GIF, for example, can store only images with a maximum pixel depth of eight bits, for a maximum of 256 colors. And its LZW compression algorithm does not work very well on typical scanned image data. The low-level noise commonly found in such data defeats LZW's ability to recognize repeated patterns. Both TIFF and BMP are capable of storing 24-bit data, but in their pre-JPEG versions are capable of using only encoding schemes (LZW and RLE, respectively) that do not compress this type of image data very well. JPEG provides a compression method that is capable of compressing continuous-tone image data with a p ixel depth of 6 to 24 bits with reasonable speed and efficiency. And although JPEG itself does not define a standard image file format, several have been invented or modified to fill the needs of JPEG data storage. JPEG in Perspective
Unlike all of the other compression methods described so far in this chapter, JPEG is not a single algorithm. Instead, it may be thought of as a toolkit of image compression methods that may be altered to fit the needs of the user. JPEG may be adjusted to produce very small, compressed images that are of relatively poor quality in appearance but still suitable for many applications. Conversely, JPEG is capable of producing very high-quality compressed images that are still far smaller than the original uncompressed data. JPEG is also different in that it is primarily a lossy method of compression. Most popular image format compression schemes, such as RLE, LZW, or the CCITT standards, are lossless compression methods. That is, they do not discard any data during the encoding process. An image compressed using a los sless method is guaranteed to be identical to the original image when uncompressed. Lossy schemes, on the other hand, throw useless data away during encoding. This is, in fact, how lossy schemes manage to obtain superior compression ratios over most lossless schemes. JPEG was designed specifically to discard information that the human eye cannot easily see. Slight changes in color are not perceived well by the human eye, while slight changes in intensity (light and dark) are. Therefore JPEG's lossy encoding tends to b e more frugal with the gray-scale part of an image and to be more frivolous with the color. 21
JPEG was designed to compress color or gray-scale continuous-tone images of real-world subjects: photographs, video stills, or any complex graphics that resemble natural subjects. Animations, ray tracing, line art, black-andwhite documents, and typical vector graphics don't compress very well under JPEG and shouldn't be expected to. And, although JPEG is now used to provide motion video compression, the standard makes no special provision for such an application. The fact that JPEG is lossy and works only on a select type of image data might make you ask, "Why bother to use it?" It depends upon your needs. JPEG is an excellent way to store 24-bit photographic images, such as those used in imaging and multimedia applications. JPEG 24-bit (16 million color) images are superior in appearance to 8-bit (256 color) images on a VGA display and are at their most spectacular when using 24-bit display hardware (which is now quite inexpensive). The amount of compression achieved depends upon the content of the image data. A typical photographic-quality image may be compressed from 20:1 to 25:1 without experiencing any noticeable degradation in quality. Higher compression ratios will result in image files that differ noticeably from the original image but still have an overall good image quality. And achieving a 20:1 or better compression ratio in many cases not only saves disk space, but also reduces transmission time across data networks and phone lines. An end user can "tune" the quality of a JPEG encoder using a parameter sometimes called a quality setting or a Q factor. Although different implementations have varying scales of Q factors, a range of 1 to 100 is typ ical. A factor of 1 produces the smallest, worst quality images; a factor of 100 produces the largest, best quality images. The optimal Q factor depends on the image content and is therefore different for every image. The art of JPEG compression is finding the lowest Q factor that produces an image that is visibly acceptable, and preferably as close to the original as possible. The JPEG library supplied by the Independent JPEG Group uses a quality setting scale of 1 to 100. To find the optimal compression for an image using the JPEG library, follow these steps: 1. 2. 3. 4.
Encode the image using a quality setting of 75 (-Q 75). If you observe unacceptabl e defects in the image, increase the value, and re-encode the image. If the image quality is acceptable, decrease the setting until the image quality is barely acceptable. This will be the optimal quality setting for this image. Repeat this process for every image you have (or just encode them all using a quality setting of 75).
JPEG isn't always an ideal compression solution. There are several reasons: •
•
•
•
As we have said, JPEG doesn't fit every compression need. Images containing large areas of a single color do not compress very well. In fact, JPEG will introduce "artifacts" into such images that are visible against a flat background, making them considerably worse in appearance than if you used a conventional lossless compression method. Images of a "busier" composition contain even worse artifacts, but they are considerably less noticeable against the image's more complex background. JPEG can be rather slow when it is implemented only in software. If fast decompression is required, a hardware-based JPEG solution is your best bet, unless you are willing to wait for a faster software-only solution to come along or buy a faster computer. JPEG is not trivial to implement. It is not likely you will be able to sit down and write your own JPEG encoder/decoder in a few evenings. We recommend that you obtain a third-party JPEG library, rather than writing your own. JPEG is not supported by very many file formats. The formats that do support JPEG are all fairly new and can be expected to be revised at frequent intervals.
Baseline JPEG
The JPEG specification defines a minimal subset of the standard called baseline JPEG, which all JPEG-awa re applications are required to support. This baseline uses an encoding scheme based on the Discrete Cosine Transform (DCT) to achieve compression. DCT is a generic name for a class of oper ations identified and published some years ago. DCT-based algorithms have since made their way into various compression methods. 22
DCT-based encoding algorithms are always lossy by nature. DCT algorithms are capable of achieving a high degree of compression with only minimal loss of data. This scheme is effective only for compressing continuoustone images in which the differences between adjacent pixels are usually small. In practice, JPEG works w ell only on images with depths of at least four or five bits per color channel. The baseline standard actually specifies eight bits per input sample. Data of lesser bit depth can be handled by scaling it up to eight bits per sample, but the results will be bad for low-bit-depth source data, because of the large jumps between adjacent pixel values. For similar reasons, colormapped source data does not work very well, especially if the image has been dithered.
The JPEG compression scheme is divided into the following stages: 1. 2. 3. 4. 5.
Transform the image into an optimal color space. Downsample chrominance components by averaging groups of pixels together. Apply a Discrete Cosine Transform (DCT) to blocks of pixels, thus removing redundant image data. Quantize each block of DCT coefficients using weighting functions optimized for the human eye. Encode the resulting coefficients (image data) using a Huffman variable word-length algorithm to remove redundancies in the coefficients.
Figure 9-11 summarizes these steps, and the following subsec tions look at each of them in turn. Note that JPEG decoding performs the reverse of these steps. Figure 9-11: JPEG compression and decompression
Transform the image The JPEG algorithm is capable of encoding images that use any type of color space. JPEG itself encodes each component in a color model separately, and it is completely independent of any color-space model, such as RGB, HSI, or CMY. The best compression ratios result if a luminance/chrominance color space, such as YUV or YCbCr, is used. (See Chapter 2 for a description of these color spaces.) Most of the visual information to which human eyes are most sensitive is found in the high-frequen cy, grayscale, luminance component (Y) of the YCbCr color space. The other two chrominance components (Cb and Cr) contain high-frequency color information to which the human eye is less sensitive. Most of this information can therefore be discarded. In comparison, the RGB, HSI, and CMY color models spread their useful visual image information evenly across each of their three color components, making the selective discarding of information very difficult. Al l three 23
color components would need to be encoded at the highest quality, resulting in a poorer compression ratio. Grayscale images do not have a color space as such and therefore do not require transforming.
Downsample chrominance components The simplest way of exploiting the eye's lesser sensitivity to chrominance information is simply to use fewer pixels for the chrominance channels. For example, in an image nominally 1000x1000 pixels, we might use a full 1000x1000 luminance pixels but only 500x500 pixels for each chrominance component. In this representation, each chrominance pixel covers the same area as a 2x2 block of luminance pixels. We store a total of six pixel values for each 2x2 block (four luminance values, one each for the two chrominance channels), rather than the twelve values needed if each component is represented at full resolution. Remarkably, this 50 percent reduction in data volume has almost no effect on the perceived quality of most images. Equivalent savings are not possible with conventional color models such as RGB, because in RGB each color channel carries some luminance information and so any loss of resolution is quite visible. When the uncompressed data is supplied in a conventional format (equal resolution for all channels), a JPEG compressor must reduce the resolution of the chrominance channels by downsampling, or averaging t ogether groups of pixels. The JPEG standard allows several different choices for the sampling ratios, or relative sizes, of the downsampled channels. The luminance channel is always left at full resolution (1:1 sampling). Typically both chrominance channels are downsampled 2:1 horizontally and either 1:1 or 2:1 ver tically, meaning that a chrominance pixel covers the same area as either a 2x1 or a 2x2 block of luminance pixels. JPEG refers to these downsampling processes as 2h1v and 2h2v sampling, respectively. Another notation commonly used is 4:2:2 sampling for 2h1v and 4:2:0 sampling for 2h2v; this notation derives from television customs (color transformation and downsampling have been in use since the beginning of color TV transmission). 2h1v sampling is fairly common because it corresponds to National Televisio n Standards Committee (NTSC) standard TV practice, but it offers less compression than 2h2v sampling, with hardly any gain in perceived quality.
Apply a Discrete Cosine Transform The image data is divided up into 8x8 blocks of pixels. (From this point on, each color component is processed independently, so a "pixel" means a single value, even in a color image.) A DCT is applied to each 8x8 block. DCT converts the spatial image representation into a frequency map: the low-order or "DC" term represents the average value in the block, while successive higher-order ("AC") terms represent the strength of more and more rapid changes across the width or height of the block. The highest AC term represents the strength of a co sine wave alternating from maximum to minimum at adjacent pixels. The DCT calculation is fairly complex; in fact, this is the most costly step in JPEG compression. The point of doing it is that we have now separated out the high- and low-frequency information present in the image. We can discard high-frequency data easily without losing low-frequency information. The DCT step itself is lossless except for roundoff errors.
Quantize each block To discard an appropriate amount of information, the compressor divides each DCT output value by a "quantization coefficient" and rounds the result to an integer. The larger the quantization coefficient, the more data is lost, because the actual DCT value is represented less and less accurately. Each of the 64 positions of the DCT output block has its own quantization coefficient, with the higher-order terms being quantized more heavily than the low-order terms (that is, the higher-order terms have larger quantization coefficients). Furthermore, separate quantization tables are employed for luminance and chrominance data, with the chrominance data being quantized more heavily than the luminance data. This allows JPEG to exploit f urther the eye's differing sensitivity to luminance and chrominance. It is this step that is controlled by the "quality" setting of most JPEG compressors. The compressor starts from a built-in table that is appropriate for a medium-quality setting and increases or decreases the value of each table 24
entry in inverse proportion to the requested quality. The complete quantization tables actually used are recorded in the compressed file so that the decompressor will know how to (approxim ately) reconstruct the DCT
coefficients. Selection of an appropriate quantization table is something of a black art. Most existing compressors start from a sample table developed by the ISO JPEG committee. It is likely that future research will yield better tables that provide more compression for the same perceived image quality. Implementation of improved tables should not cause any compatibility problems, because decompressors merely read the tables from the compressed file; they don't care how the table was picked.
Encode the resulting coefficients The resulting coefficients contain a significant amount of redundant data. Huffman compression will losslessly remove the redundancies, resulting in smaller JPEG data. An optional extension to the JPEG specification allows arithmetic encoding to be used instead of Huffman for an even greater compression ratio. (See the section called "JPEG Extensions (Part 1)" below.) At this point, the JPEG data stream is ready to be tra nsmitted across a communications channel or encapsulated inside an image file format. JPEG Extensions (Part 1)
What we have examined thus far is only the baseline specification for JPEG. A number of extensions have been defined in Part 1 of the JPEG specification that provide progressive image buildup, improved compression ratios using arithmetic encoding, and a lossless compression scheme. These features are beyond the needs of most JPEG implementations and have therefore been defined as "not required to be supported" extensions to the JPEG standard.
Progressive image buildup Progressive image buildup is an extension for use in applications that need to receive JPEG data stre ams and display them on the fly. A baseline JPEG image can be displayed only after all of the ima ge data has been received and decoded. But some applications require that the image be displayed after only some of the data is received. Using a conventional compression method, this means displaying the first few scan lines of the image as it is decoded. In this case, even if the scan lines were interlaced, you would need at least 50 percent of the image data to get a good clue as to the content of the image. The progressive buildup extension of JPEG offers a better solution. Progressive buildup allows an image to be sent in layers rather than scan lines. But instead of transmitting each bitplane or color channel in sequence (which wouldn't be very useful), a succession of images built up from approximations of the original image are sent. The first scan provides a low-accuracy representation of the entire image--in effect, a very low-quality JPEG compressed image. Subsequent scans gradually refine the image by increasing the effective quality factor. If the data is displayed on the fly, you would f irst see a crude, but recognizable, rendering of the whole image. This would appear very quickly because only a small amount of data would need to be transmitted to produce it. Each subsequent scan would improve the displayed image's quality one block at a time. A limitation of progressive JPEG is that each scan takes essentially a full JPEG decompression cycle to display. Therefore, with typical data transmission rates, a very fast JPEG decoder (probably specialized hardware) would be needed to make effective use of progressive transmission. A related JPEG extension provides for hierarchical storage of the same image at multi ple resolutions. For example, an image might be stored at 250x250, 500x500, 1000x1000, and 2000x2000 pixels, so that the s ame image file could support display on low-resolution screens, medium-resolution laser printers, and high-resolution imagesetters. The higher-resolution images are stored as differences from the lower-resolution ones, so they need less space than they would need if they were stored independently. This is not the same as a progressive series, because each image is available in its own right at the full desired quality. 25
Arithmetic encoding The baseline JPEG standard defines Huffman compression as the final step in the encoding process. A JPEG extension replaces the Huffman engine with a binary arithmetic entropy encoder. The use of an arithmetic coder reduces the resulting size of the JPEG data by a further 10 percent to 15 percent over the results that would be achieved by the Huffman coder. With no change in resulting image quality, this gain could be of importance in implementations where enormous quantities of JPEG images are archived. Arithmetic encoding has several drawbacks: •
• •
Not all JPEG decoders support arithmetic decoding. Baseline JPEG decoders are required to support only the Huffman algorithm. The arithmetic algorithm is slower in both encoding and decoding than Huffman. The arithmetic coder used by JPEG (called a Q-coder ) is owned by IBM and AT&T. (Mitsubishi also holds patents on arithmetic coding.) You must obtain a license from the appropriate vendors if their Qcoders are to be used as the back end of your JPEG implementation.
Lossless JPEG compression A question that commonly arises is "At what Q factor does JPEG become lossless?" The answer is "never." Baseline JPEG is a lossy method of compression regardless of adjustments you may make in the parameters. In fact, DCT-based encoders are always lossy, because roundoff errors are inevitable in the color conversion and DCT steps. You can suppress deliberate information loss in the downsampling and quantization steps, but yo u still won't get an exact recreation of the original bits. Further, this minimum-loss setting is a very inefficient way to use lossy JPEG. The JPEG standard does offer a separate lossless mode. This mode has nothing in common with the regular DCT based algorithms, and it is currently implemented only in a few commercial applications. JPEG lossless is a form of Predictive Lossless Coding using a 2D Differential Pulse Code Modulation (DPCM) sch eme. The basic premise is that the value of a pixel is combined with the values of up to three neighboring pixels to form a predictor value. The predictor value is then subtracted from the original pixel value. When the entire bitmap has been processed, the resulting predictors are compressed using either the Huffman or the binary arithmetic entropy encoding methods described in the JPEG standard. Lossless JPEG works on images with 2 to 16 bits per pixel, but performs best on images with 6 or more bits per pixel. For such images, the typical compression ratio achieved is 2:1. For image data with fewer bits per pixels, other compression schemes do perform better. JPEG Extensions (Part 3)
The following JPEG extensions are described in Part 3 of the JPEG specification. Variable quantization Variable quantization is an enhancement available to the quantization procedure of DCT-based processes. This enhancement may be used with any of the DCT-based processes defined by JPEG with the excepti on of the baseline process. The process of quantization used in JPEG quantizes each of the 64 DCT coefficients using a corresponding value from a quantization table. Quantization values may be redefined prior to the start of a scan bu t must not be changed once they are within a scan of the compressed data stream. Variable quantization allows the scaling of quantization values within the compressed data stream. At the start of each 8x8 block is a quantizer scale factor used to scale the quantization table values within an image component 26
and to match these values with the AC coefficients stored in the compressed data. Quantization values may then be located and changed as needed. Variable quantization allows the characteristics of an image to be changed to control the quality of the o utput based on a given model. The variable quantizer can constantly adjust during decoding to provide optimal output. The amount of output data can also be decreased or increased by raising or lowering the quantizer scale factor. The maximum size of the resulting JPEG file or data stream may be imposed by constant adaptive adjustments made by the variable quantizer. The variable quantization extension also allows JPEG to store image data originally encoded using a vari able quantization scheme, such as MPEG. For MPEG data to be accurately transcoded into another format, the other format must support variable quantization to maintain a high compression ratio. This extension allows JPEG to support a data stream originally derived from a variably quantized source, such as an MPEG I-frame.
Selective refinement Selective refinement is used to select a region of an image for further enhancement. This enhancement improves the resolution and detail of a region of an image. JPEG supports three types of selective refinement: hierarchical, progressive, and component. Each of these refinement processes differs in its application, effectiveness, complexity, and amount of memory required. •
•
•
Hierarchical selective refinement is used only in the hierarchical mode of operation. It allows for a region of a frame to be refined by the next differential frame of a hierarchical sequence. Progressive selective refinement is used only in the progressive mode and adds refinement. It allows a greater bit resolution of zero and non-zero DCT coefficients in a coded region of a frame. Component selective refinement may be used in any mode of operation. It allows a region of a frame to contain fewer colors than are defined in the frame header.
Image tiling Tiling is used to divide a single image into two or more smaller subimages. Tiling allows easier buffering of the image data in memory, quicker random access of the image data on disk, and the storage of images larger than 64Kx64K samples in size. JPEG supports three types of tiling: simple, pyramidal, and composite. •
•
Simple tiling divides an image into two or more fixed-size tiles. All simple tiles are coded from left to right and from top to bottom and are contiguous and non-overlapping. All tiles must have the same number of samples and component identifiers and must be encoded using the same processes. Tiles on the bottom and right of the image may be smaller than the designated size of the image dimensions and will therefore not be a multiple of the tile size. Pyramidal tiling also divides the image into tiles, but each tile is also tiled using several different levels of resolution. The model of this process is the JPEG Tiled Image Pyramid (JTIP), which is a model of how to create a multi-resolution pyramidal JPEG image. A JTIP image stores successive layers of the same image at different resolutions. The first image stored at the top of the pyramid is one-sixteenth of the defined screen size and is called a vignette. This image is used for quick displays of image contents, especially for file browsers. The next image occupies onefourth of the screen and is called an imagette. This image is typically used when two or more images must be displayed at the same time on the screen. The next is a low-resolution, full-screen image, followed by successively higher-resolution images and ending with the original image. Pyramidal tiling typically uses the process of "internal tiling," where each tile is encoded as part of the same JPEG data stream. Tiles may optionally use the process of "external tiling," where each tile is a separately encoded JPEG data stream. External tiling may allow quicker access of image data, easier application of image encryption, and enhanced compatibility with certain JPEG decoders. 27
•
Composite tiling allows multiple-resolution versions of images to be stored and displayed as a mosaic . Composite tiling allows overlapping tiles that may be different sizes and have different scaling factors and compression parameters. Each tile is encoded separately and may be combined with other tiles without resampling.
SPIFF (Still Picture Interchange File Format) SPIFF is an officially sanctioned JPEG file format that is intended to replace the defact o JFIF (JPEG File Interchange Format) format in use today. SPIFF includes all of the features of JFIF and adds quite a bit more functionality. SPIFF is designed so that properly written JFIF readers will read SPIFF-JPEG files as well. For more information, see the article about SPIFF.
Other extensions Other JPEG extensions include the addition of a version marker segment that stores the minimum level of functionality required to decode the JPEG data stream. Multiple version markers may be included to mark areas of the data stream that have differing minimum functionality requirements. The version marker also c ontains information indicating the processes and extension used to encode the JPEG data stream.
IMAGE FORMATS There are three major graphics formats on the web: GIF, JPEG, and PNG. Of these, PNG has the spottiest support, so that generally leaves one to chose between GIF or JPEG format. There are many ot her available formats in which to save image files; it is likely that many of your web site visitors will not be able to view your files.
JPEG JPEG is a lossy compression technology, so some information is lost when converting a picture to JPEG. Use this format for most photographs because the images will be smaller and look better than a GIF format picture.
GI F GIF files are better for figures with sharp contrast (such as line drawings, Gantt charts, logos, and buttons). One can also create transparent areas and animations with GIF images. A GIF image has a maximum of 256 color s however, so images with gradations of color will not look very good.
PN G GIF is a patented file format technology. PNG is an open-source standard that can be used for many of the applications of GIF images. PNG is better than GIF in most respects, providing more possible colors, alphachannel transparency, and color matching features. The PNG format is not as widely supported as GIF, although it is supported (to differing degrees) on the version 4 and later browsers. BMP
BMP or bitmap files are pictures from the Windows operating system. Using these on a web page can cause problems because they cannot be viewed by most browser. Stay away from using BMP files on the web.
28
TIFF
TIFF images have great picture quality but also a very large file size. Most browsers cannot display TIFF images. Use TIFF on your machine to save images for printing or editing; do not use TIFFs on the web.
The GIF image format GIF stands for Graphics Interchange Format . It is probably the most common image format used on the Web. GIFs have the advantage of usually being very small in size, which makes them fast-loading. Unlike JPEGs, GIFs us e lossless compression, which means they make the file size small without losing or blurring any of the image itself. GIFs also support transparency , which means that they can sit on top of a background image on your web page without having ugly rectangles around them. Another cool thing that GIFs can do is animation. You can make an animated GIF by drawing each frame of the animation in a graphics package that supports the animated GIF format, then export the animation to a single GIF file. When you include this file in your Web page (with the img tag), your animation will be displayed on the page! The major disadvantage of GIFs is that they only support up to 256 colours (this is known as 8-bit colour and is a type of indexed colour image). This means they're not good for photographs, or any other image that contains lots of different colours.
Making Fast-Loading GIFs It's worthwhile making your GIF file sizes as small as possible, so that your Web pages load quickly. People will get very bored otherwise, and probably go to another website! Most graphics programs let you control various settings when making a GIF image, such as palette size (number of colours in the image) and dithering. Generally, speaking, use the smallest palette size you can. Usual ly 32 colour palette produce acceptable results, although for low-colour images you can often get away with 16. Images with lots of colours will of course need a bigger palette - say, 128, or even 256 colours.
8-colour GIF (1292 bytes)
64-colour GIF (2940 bytes)
The JPEG Image Format JPEG stands for Joint Photographic Experts Group, a bunch of boffins who invented this format to display fullcolour photographic images in a portable format with a small file size. Like GIF images, they are also very 29
common on the Web. Their main advantage over GIFs is that they can display true-colour images (up to 16 million colours), which makes them much better for images such as photographs and illustrations w ith large numbers of colours. The main disadvantage of the JPEG format is that it is lossy . This means that you lose some of the detail of your image when you convert it to JPEG format. Boundaries between blocks of colour may appear more blurry, and areas with lots of detail will lose their sharpness. On the other hand, JPEGs do prese rve all of the colour information in the image, which of course is great for high-colour images such as photographs. JPEGs also can't do transparency or animation - in these cases, you'll have to use the GIF format (or PNG format for transparency).
Making Fast-Loading JPEGs As with GIFs, it pays to make your JPEGs as small as possible (in terms of bytes), so that your websites load quickly. The main control over file size with JPEGs is called quality , and usually varies from 0 to 100%, where 0% is low quality (but smallest file size), and 100% is highest quality (but largest file size). 0% quality JPEGs usually look noticeably blurred when compared to the original. 100% quality JPEGs are often indistinguishable from the original:
Low-quality JPEG (4089 bytes)
High-quality JPEG (17465 bytes)
The PNG Image Format PNG is a relatively new invention compared to GIF or JPEG, although it's been around for a while now. (Sadly some browsers such as IE6 still don't support them fully.) It stands for Portable Network Graphics. It was designed to be an alternative to the GIF file format, but without the licensing issues that were involved in the GIF compression method at the time.
30
There are two types of PNG: PNG-8 format, which holds 8 bits of colour information (comparable to GIF), and PNG-24 format, which holds 24 bits of colour (comparable to JPEG). PNG-8 often compresses images even better than GIF, resulting in smaller file sizes. On the other hand, PNG-24 is often less effective than JPEGs at compressing true-colour images such as photos, resulting in larger file sizes than the equivalent quality JPEGs. However, unlike JPEG, PNG-24 is lossless, meaning that all of the original image's information is preserved. PNG also supports transparency like GIF, but can have varying degrees of transparency for each pixel, whereas GIFs can only have transparency turned on or off for each pixel. This means that whereas transparent GIFs often have jagged edges when placed on complex or ill-matching backgrounds, transparent PNGs will have nice smooth edges. Note that unlike GIF, PNG-8 does not support animation. One important point about PNG: Earlier browsers don't recognise them. If you want to ensure your website is viewable by early browsers, use GIFs or JPEGs instead.
16-colour PNG-8 (6481 bytes)
Full-colour PNG-24 (34377 bytes)
Summary of image formats This table summarises the key differences between the GIF, JPEG and PNG image formats. GIF
JPEG
PNG-8
PNG-24
Better for Better for clipart and drawn photographs with lots graphics with few colours, or of colours or fine large blocks of colour colour detail
Better for clipart and drawn Better for photographs graphics with few colours, or with lots of colours or large blocks of colour fine colour detail
Can only have up to 256 colours
Can have up to 16 million colours
Can only have up to 256 colours
Can have up to 16 million colours
Images are "lossless" - they contain the same amount of information as the original (but with only 256 colours)
Images are "lossy" they contain less information than the original
Images are "lossless" - they contain the same amount of information as the original (but with only 256 colours)
Images are "lossless" they contain the same amount of information as the original
Can be animated
Cannot be animated
Cannot be animated
Cannot be animated
Can have transparent areas
Cannot have transparent areas
Can have transparent areas
Can have transparent areas
31
Image or Graphic? Technically, neither. If you really want to be strict, computer pictures are files, t he same way WORD documents or solitaire games are files. They're all a bunch of ones and zeros all in a row. But we do have to communicate with one another so let's decide. Image. We'll use "image". That seems to cover a wide enough topic range. I went to my reference books and there I found that "graphic" is more of an adjective, as in "graphic format." You see, we denote images on the Internet by their graphic format. GIF is not the name of the image. GIF is the compression factors used to create the raster format set up by CompuServe. (More on that in a moment). So, they're all images unless you're talking about something specific.
44 Different Graphic Formats? It does seem like a big number, doesn't it? In reality, there are not 44 different graphic format names. Many of the 44 are different versions under the same compression umbrella, interlaced and non-interlaced GIF, for example. Before getting into where we get all 44, and there are more than that even, let me back-peddle for a moment. There actually are only two basic methods for a computer to render, or store and display, an image. When you save an image in a specific format you are creating either a raster or meta/vector graphic form at. Here's the lowdown:
Raster Raster image formats (RIFs) should be the most familiar to Internet users. A Raster format breaks the image into a series of colored dots called pixels. The number of ones and zeros (bits) used to create each pixel denotes the depth of color you can put into your images. If your pixel is denoted with only one bit-per-pixel then that pixel must be black or white. Why? Because that pixel can only be a one or a zero, on or off, black or white. Bump that up to 4 bits-per-pixel and you're able to set that colored dot to one of 16 colors. If you go eve n higher to 8 bits-per-pixel, you can save that colored dot at up to 256 different colors. Does that number, 256 sound familiar to anyone? That's the upper color level of a GIF image. Sure, you can go with less than 256 colors, but you cannot have over 256. That's why a GIF image doesn't work overly well for photographs and larger images. There are a whole lot more than 256 colors in the world. Images can carry millions. But if you want smaller icon images, GIFs are the way to go. Raster image formats can also save at 16, 24, and 32 bits-per-pixel. At the two highest levels, the pixels themselves can carry up to 16,777,216 different colors. The image looks great! Bitmaps saved at 24 bits-per-pixel are great quality images, but of course they also run about a megabyte per picture. There's always a trade-o ff, isn't there? The three main Internet formats, GIF, JPEG, and Bitmap, are all Raster formats. Some other Raster formats include the following: 32
CL P
Windows Clipart
D CX
ZOFT Paintbrush
DIB
OS/2 Warp format
F PX
Kodak's FlashPic
IMG
GEM Paint format
JI F
JPEG Related Image format
MA C
MacPaint
M SP
MacPaint New Version
P CT
Macintosh PICT format
P CX
ZSoft Paintbrush
P PM
Portable Pixel Map (UNIX)
PSP
Paint Shop Pro format
R AW
RL E
Unencoded image format Run-Length Encoding (Used to lower image bit rates)
TIFF
Aldus Corporation format
WP G
WordPerfect image format
Pixels and the Web Since I brought up pixels, I thought now might be a pretty good time to talk about pixels and the Web. How much is too much? How many is too few? There is a delicate balance between the crispness of a picture and the number of pixels needed to display it. Let's say you have two images, each is 5 inches across and 3 inches down. One uses 300 pixels to span that five inches, the other uses 1500. Obviously, the one with 1500 uses smaller pixels. It is also the one that offers a more crisp, detailed look. The more pixels, the more detailed the image will be. Of course, the more pixels the more bytes the image will take up. So, how much is enough? That depends on whom you are speaking to, and right now you're speaking to me. I always go with 100 pixels per inch. That creates a ten-thousand pixel square inch. I've found that allows fo r a pretty crisp image without going overboard on the bytes. It also allows some leeway to increase or decrease the size of the image and not mess it up too much. The lowest I'd go is 72 pixels per inch, the agreed upon low end of the image scale. In terms of pixels per square inch, it's a whale of a drop to 5184. Try that. See if you like it, but I think you'll find that lower definition monitors really play havoc with the image.
Meta/Vector Image Formats You may not have heard of this type of image formatting, not that you had heard of Raster, either. This formatting falls into a lot of proprietary formats, formats made for specific progra ms. CorelDraw (CDR), Hewlett-Packard Graphics Language (HGL), and Windows Metafiles (EMF) are a few examples. Where the Meta/Vector formats have it over Raster is that they are more than a simple grid of colored dots. They're actual vectors of data stored in mathematical formats rather than bits of colored dots. This allows for a strange shaping of colors and images that can be perfectly cropped on an arc. A squared-off map of dots cannot produce that arc as well. In addition, since the information is encoded in vectors, Meta/Vector image formats can 33
be blown up or down (a property known as "scalability") without looking jagged or crowded (a property known as "pixelating"). So that I do not receive e-mail from those in the computer image know, there is a difference in Met a and Vector formats. Vector formats can contain only vector data whereas Meta files, as is implied by the name, can contain multiple formats. This means there can be a lovely Bitmap plopped right in the middle of your Windows Meta file. You'll never know or see the difference but, there it is. I'm just trying to keep everybody happy.
What's A Bitmap? I get that question a lot. Usually it's followed with "How come it only works on Microsoft Internet Explorer?" The second question's the easiest. Microsoft invented the Bitmap format. It would only make sense they would include it in their browser. Every time you boot up your PC, the majority of the images used in the process and on the desktop are Bitmaps. If you're using an MSIE browser, you can view this first example. The image is St. Sophia in Istanbul. The picture is taken from the city's hippodrome. Against what I said above, Bitmaps will display on all browsers, just not in the familiar
format we're all used to. I see Bitmaps used mostly as return images from PERL Common Gateway Interfaces ( CG Is ). A c ou nt er is a p er fe ct ex am pl e. Pa ge co un te rs th at ha ve th at "o do me te r" ef fe ct ( ) ar e B it ma p images created by the server, rather than as an inline image. Bitmaps are perfect for this process because they're a simple series of colored dots. There's nothing fancy to building them. It's actually a fairly simple process. In the script that runs the counter, you "build" each number for the counter to display. Note the counter is black and white. That's only a one bit-per-pixel level image. To create the number zero in the counter above, you would build a grid 7 pixels wide by 10 pixels high. The pixels you want to remain black, you would denote as zero. Those you wanted white, you'd denote as one. Here's what it looks like: 0 0 0 0 0 0 0 0 0
1 1 1 0 0
0
1 1 1 1 1 0
0
1 1 0 1 1 0
0
1 1 0 1 1 0
0
1 1 0 1 1 0
0
1 1 0 1 1 0
0
1 1 1 1 1 0
0 0
1 1 1 0 0
0 0 0 0 0 0 0
See the number zero in the graph above? I made it red so it would stand out a bit more. You create one of those patterns for the numbers 0 through 9. The PERL script then returns the Bitmap image repre senting the numbers and you get that neat little odometer effect. That's the concept of a Bitmap. A grid of colored po ints. The more bits per pixel, the more fancy the Bitmap can be. Bitmaps are good images, but they're not great. If you've played with Bitmaps versus any other image formats, you might have noticed that the Bitmap format creates images that are a little heavy on the bytes. The reason is that the Bitmap format is not very efficient at storing data. What you see is pretty much what you get, one series of bits stacked on top of another.
Compression I said above that a Bitmap was a simple series of pixels all stacked up. But the same image saved in GIF or JPEG format uses less bytes to make up the file. How? Compression. "Compression" is a computer term that represents a variety of mathematical formats used to compre ss an image's byte size. Let's say you have an image where the upper right-hand corner has four pixels all the same color. Why not find a way to make those four pixels into one? That would cut down the number of bytes by threefourths, at least in the one corner. That's a compression factor. Bitmaps can be compressed to a point. The process is called "run-length encoding." Runs of pixels that are all the same color are all combined into one pixel. The longer the run of pixels, the more compression. Bitmaps with little detail or color variance will really compress. Those with a great deal of detail don't offer much in the way 34
of compression. Bitmaps that use the run-length encoding can carry either the common ".bmp" extension or ".rle". Another difference between the two files is that the common Bitmap can accept 16 million different colors per pixel. Saving the same image in run-length encoding knocks the bits-per-pixel down to 8. That locks the level of color in at no more than 256. That's even more compression of bytes to boot. Here's the same image of St. Sophia in common Bitmap and the run-length encoding format. Can you see a difference? If case you're wondering, the image was saved in Windows version run-length encoding ( there's also a CompuServe version) at 256 colors. It produced quite a drop in bytes, don't you think? And to be ho nest -- I really don't see a whole lot of difference. So, why not create a single pixel when all of the colors are close? You could even lower the number of colors available so that you would have a better chance of the pixels being close in color. Good idea. The p eople at CompuServe felt the same way.
The GIF Image Formats So, why wasn't the Bitmap chosen as the King of all Internet Images? Because Bill Gates hadn't yet gotten into the fold when the earliest browsers started running inline images. I don't mean to be flippant either; I truly believe that. GIF, which stands for "Graphic Interchange Format," was first standardized in 1987 by CompuServe, although the patent for the algorithm (mathematical formula) used to create GIF compression actually belongs to Unisys. The first format of GIF used on the Web was called GIF87a, representing its year and version. It saved images at 8 pits-per-pixel, capping the color level at 256. That 8-bit level allowed the image to work across multiple server styles, including CompuServe, TCP/IP, and AOL. It was a graphic for all seasons, so to speak. CompuServe updated the GIF format in 1989 to include animation, transparency, and interlacing. They called the new format, you guessed it: GIF89a. There's no discernable difference between a basic (known as non-interlaced) GIF in 87 and 89 formats. See for yourself. The image is of me and another gentleman playing a Turkish Sitar. Even the bytes are the same. It's the transparency, animation, and non-interlacing additions to GIF89a that really set it apart. Let's look at each one. Animation
I remember when animation really came into the mainstream of Web page development. I was deluged with email asking how to do it. There's been a tutorial up for a while now at http://www.htmlgoodies.com/tutors/animate.html. Stop by and see it for instruction on how to create the animations yourself. Here, we're going to quickly discuss the concepts of how it all works. What you are seeing in that example are 12 different images, each set one "hour" farther ahead than the one before it. Animate them all in a row and you get that stopwatch effect. The concept of GIF89a animation is much the same as a picture book with small animation cells in each corner. Flip the pages and the images appear to move. Here, you have the ability to set the cell's (technically called an "animation frame") movement speed in 1/100ths of a second. An internal clock embedded right into the GIF keeps count and flips the image when the time comes. The animation process has been bettered along the way by companies who have found their own method of compressing the GIFs further. As you watch an animation you might notice that very little changes from frame to frame. So, why put up a whole new GIF image if only a small section of the frame needs to be changed? That's the key to some of the newer compression factors in GIF animation. Less changing means fewer bytes.
Transparency Again, if you'd like a how-to, I have one you for you at http://www.htmlgoodies.com/tutors/transpar.html. A transparent GIF is fun but limited in that only one color of the 256-shade palette can be made transparent. As you can see, the bytes came out the same after the image was put through the transparency filt er. The process is best described as similar to the weather forecaster on your local news. Each night they stand in front of a big green (sometimes blue) screen and deliver the weather while that blue or green behind them is "keyed" out and replaced by another source. In the case of the weather forecaster, it's usually a large map with lots of Ls and Hs. 35
The process in television is called a "chroma key." A computer is told to hone in on a specific color, let's say it's green. Chroma key screens are usually green because it's the color least likely to be found in h uman skin tones. You don't want to use a blue screen and then chroma out someone's pretty blue eyes. That chroma (color) is then "erased" and replaced by another image. Think of that in terms of a transparent GIF. There are only 256 colors available in the GIF. The computer is told to hone in on one of them. It's done by choosing a particular red/green/blue shade already found in the image and blanking it out. The color is basically dropped from the palette that makes up the image. Thus whatever is behind it shows through. The shape is still there though. Try this: Get an image with a transparent background and alter its height and width in your HTML code. You'll see what should be the transparent color seeping through. Any color that's found in the GIF can be made transparent, not just the color in the background. If the background of the image is speckled then the transparency is going to be speckled. If you cut out the color blue in the background, and that color also appears in the middle of the image, it too will be made transparent. When I put together a transparent image, I make the image first, then copy and paste it onto a slightly larger square. That square is the most hideous green I can mix up. I'm sure it doesn't appear in the image. That way only the background around the image will become clear.
Interlaced vs. Non-Interlaced GIF The GIF images of me playing the Turkish Sitar were non-interlaced format images. This is what is meant when someone refers to a "normal" GIF or just "GIF". When you do NOT interlace an image, you fill it in from the top to the bottom, one line after another. The following image is of two men coming onto a boat we used to cross from the European to the A sian side of Turkey. The flowers they are carrying were sold in the manner of roses we might buy our wife here in the U.S. I bought one. Hopefully, you're on a slower connection computer so you got the full effect of waiting for the image to come in. It can be torture sometimes. That's where the brilliant Interlaced GIF89a idea came from. Interlacing is the concept of filling in every other line of data, then going back to the top and doi ng it all again, filling in the lines you skipped. Your television works that way. The effect on a computer monitor is that the graphic appears blurry at first and then sharpens up as the other lines fill in. That allows your viewer to at least get an idea of what's coming up rather than waiting for the entire image, line by line. The example image below is of a spice shop in the Grand Covered Bazaar, Istanbul. Both interlaced and non-interlaced GIFs get you to the same destination. They just do it differently. It's up to you which you feel is better.
JPEG Image Formats JPEG is a compression algorithm developed by the people the format is named after, the Joint Photographic Experts Group. JPEG's big selling point is that its compression factor stores the image on the hard drive in less bytes than the image is when it actually displays. The Web took to the format straightaway because not only did the image store in fewer bytes, it transferred in fewer bytes. As the Internet adage goes, the pipeline isn't getting any bigger so we need to make what is traveling through it smaller. For a long while, GIF ruled the Internet roost. I was one of the people who didn't really like this new JPEG format when it came out. It was less grainy than GIF, but it also caused computers without a decent amount of memory to crash the browser. (JPEGs have to be "blown up" to their full size. That takes some memory.) There was a time when people only had 8 or 4 megs or memory in their boxes. Really. It was way back in the Dark Ages. JPEGs are "lossy." That's a term that means you trade-off detail in the displayed picture for a smaller storage file. I always save my JPEGs at 50% or medium compression. Here's a look at the same image saved in normal, or what's called "sequential" encoding. That's a top-to bottom, single-line, equal to the GIF89a non-interlaced format. The image is of an open air market in Basra. The smell was amazing. If you like olives, go to Turkey. Cucumbers, too, believe it or not. The difference between the 1% and 50% compression is not too bad, but the drop in bytes is impressive. The numbers I am showing are storage numbers, the amount of hard drive space the image takes up. You've probably already surmised that 50% compression means that 50% of the image is included i n the algorithm. If you don't put a 50% compressed image next to an exact duplicate image at 1% compression, it looks pretty good. But what about that 99% compression image? It looks horrible, but it's great for teaching. Look at it again. See how it appears to be made of blocks? That's what's meant by lossy. Bytes are lost at the expense of 36
detail. You can see where the compression algorithm found groups of pixels that all appeared to be close in color and just grouped them all together as one. You might be hard pressed to figure out what the image was actually showing if I didn't tell you.
Progressive JPEGs You can almost guess what this is all about. A progressive JPEG works a lot like the interlaced GIF89a by filling in every other line, then returning to the top of the image to fill in the remainder. The example is again presented three times at 1%, 50%, and 99% compression. The image is of the port at Istanbul from our hotel rooftop. Obviously, here's where bumping up the compression does not pay off. Rule of thumb: If you're going to use progressive JPEG, keep the compression up high, 75% or better.
JPEG (Joint Photographic Experts Group) JPEG is a standardised image compression mechanism. JPEG is designed for compressing either full-colour (24 bit) or grey-scale digital images of "natural" (real-world) scenes.
It works well on photographs, naturalistic artwork, and similar material; not so well on lettering, simple cartoons, or black-and-white line drawings (files come out very large). JPEG handles only still images, but there is a related standard called MPEG for motion pictures. JPEG is "lossy", meaning that the image you get out of decompression isn't quite identical to what you originally put in. The algorithm achieves much of its compression by exploiting k nown limitation of the human eye, notably the fact that small colour details aren't perceived as well as small details of light-and-dark. Thus, JPEG is intended for compressing images that will be looked at by humans. A lot of people are scared off by the term "lossy compression". But when it comes to representing real-world scenes, no digital image format can retain all the information that impinges on your eyeball. By comparison with the real-world scene, JPEG loses far less information than GIF. Quality v Compression
A useful property of JPEG is that the degree of lossiness can be varied by adjusting compression parameters. This means that the image maker can trade off file size against output image quality. For good-quality, full-color source images, the default quality setting (Q 75) is very often the best choice. Try Q 75 first; if you see defects, then go up. Except for experimental purposes, never go above about Q 95; using Q 100 will produce a file two or three times as large as Q 95, bu t of hardly any better quality. If you see a file made with Q 100 , it's a pretty sure sign that the maker didn't know what he/she was doing. If you want a very small file (say for preview or indexing purposes) and are prepared to tolerate large defects, a Q setting in the range of 5 to 10 is about right. Q 2 or so may be amusing as "op art".
37
GIF (Graphics Interchange Format) The Graphics Interchange Format was developed in 1987 at the request of Compuserve, who needed a platform independent image format that was suitable for transfer across slow connections. It is a compressed (lossless) format (it uses the LZW compression) and compresses at a ratio of between 3:1 and 5:1 It is an 8 bit format which means the maximum number of colours supported by the format is 256. There are two GIF standards, 87a and 89a (developed in 1987 and 1989 respectively). The 89a standard has additional features such as improved interlacing, the ability to define one colour to be transparent and the ability to store multiple images in one file to create a basic form of animation. Both Mosaic and Netscape will display 87a and 89a GIFs, but while both support transparency and interlacing, only Netscape supports animated GIFs.
PNG (Portable Network Graphics format) In January 1995 Unisys, the company Compuserve contracted to create the GIF format, announced that they would be enforcing the patent on the LZW compression technique the GIF format uses. This means that commercial developers that include the GIF encoding or decoding algorithms have to pay a license fee to Compuserve. This does not concern users of GIFs or non-commercial developers. However, a number of people banded together and created a completely patent-free graphic s format called PNG (pronounced "ping"), the Portable Network Graphics format. PNG is superior to GIF in that it has better compression and supports millions of colours. PNG files end in a .png suffix. PNG is supported in Netscape 4.03 and above. For more information, try the PNG home page.
When should I use JPEG, and when should I stick with GIF? JPEG is not going to displace GIF entirely. For some types of images, GIF is superior in image quality, file size, or both. One of the first things to learn about JPEG is which kinds of images to apply it to. Generally speaking, JPEG is superior to GIF for storing full-color or grey-scale images of "realistic" scenes; that means scanned photographs and similar material. Any continuous variation in color, such as occurs in highlighted or shaded areas, will be represented more faithfully and in less space by JPEG than by GIF. GIF does significantly better on images with only a few distinct colors, such as line drawings and simple cartoons. Not only is GIF lossless for such images, but it often compresses them more than JPEG can. For example, large areas of pixels that are all exactly the same color are compressed very efficiently indeed by GIF. JPEG can't squeeze such data as much as GIF does without introducing visible defects. (One implication of this is that large single-color borders are quite cheap in GIF files, while they are best avoided in JPEG files.) Computer-drawn images (ray-traced scenes, for instance) usually fall between photographs and cartoons in terms of complexity. The more complex and subtly rendered the image, the more 38
likely that JPEG will do well on it. The same goes for semi-realistic artwork (fantasy drawings and such). JPEG has a hard time with very sharp edges: a row of pure-black pixels adjacent to a row of purewhite pixels, for example. Sharp edges tend to come out blurred unless you use a very high quality setting. Edges this sharp are rare in scanned photographs, but are fairly common in GIF files: borders, overlaid text, etc. The blurriness is particularly objectionable with text that's only a few pixels high. If you have a GIF with a lot of small-size overlaid text, don't JPEG it. Plain black-and-white (two level) images should never be converted to JPEG; they violate all of the conditions given above. You need at least about 16 grey levels before JPEG is useful for greyscale images. It should also be noted that GIF is lossless for grey-scale images of up to 256 levels, while JPEG is not.
39