Image Compression Fundamentals

Resmi N.G N.G.. Reference: Digital Image Processing 2nd Edition Rafael C. Gonzalez Richard E. Woods

Overview  

Introduction Fundamentals    



Image Compression Models  



Coding Redundancy Interpi xel Redundancy Psychovisual Redundancy Fidelity Criteria Source Encoder and Decoder Channel Encoder and Decoder

Elementss of Information Theory Element   

Measuring Information The Information Channel Fundamental Coding Theorems 





Noiseless Coding Theorem Noisy Coding Theorem Source Coding Theorem



Error-Free Compression 

Variable-Length Coding 





 

LZW Coding Bit-Plane Coding 











Huffman Coding Huffman Other Near Optimal Variable Length Codes Arithmetic Coding

Bit-Plane Decomposition Constant Area Coding One-Dimensional Run-Length Coding Two-Dimensional Run-Length Coding

Lossless Predictive Coding

Lossy Compressi Compression on 

Lossy Predictive Coding







Transform Coding 

Transform Selection



Subimage Size Selection



Bit Allocation 

Zonal Coding Implementation



Threshold Coding Implementation Implementation

Wavelet Coding 

Wavelet Selection



Decomposition Level Selection



Quantizer Design

Image Compression Standards 

Binary Image Compression Standards 

One Dimensional Compression



Two Dimensional Compression



Continuous Tone Still Image Compression Standards 





JPEG 

Lossy Baseline Coding System



Extended Coding System



Lossless Independent Coding System

JPEG 2000

Video Compression Standards

Introduction 





Need for Compression 

Huge amount of digital data



Difficult to store and transmit

Solution 

Reduce the amount of data required to represent a digital image



Remove redundant data



Transform the data prior to storage and transmission

Categories 

Information Preserving



Lossy Compression

Fundamentals 

Data compression



Difference between data and information



Data Redundancy



If n1 and n2 denote the number of information-carrying units in two datasets that represent the same information, the relative data redundancy R D of the first dataset is defined as 1 R D  1 

C R

where, C R 

,

n1 n2

, is called the compression ratio.

Case1 : n2  n1 C R  1 and RD  0  no redundant data Case 2 : n2  n1 C R   and RD  1  highly redundant data significant compression Case 3 : n2  n1 C R  0 and RD    second dataset contains more data than the original

Overview  















Coding Redundancy 



Let a discrete random variable r k in [0,1] represent the graylevels of an image. pr(rk ) denotes the probability of occurrence of r k .

pr ( k r)  



nk n

, k 0,1, 2, ... L 1

If the number of pixels used to represent each value of r k is l(rk ), then the average number of bits required to represent each pixel is

Lavg 

L 1

 l( r ) p ( r ) k

k  0

r

k



Hence, the total number of bits required to code an MxN image is MNLavg.



For representing an image using an m-bit binary code, Lavg= m.

How to achieve data compression?  Variable length coding - Assign fewer bits to the more probable graylevels graylevels than to the less probable ones. 



Find Lavg, compression ratio and redundancy.

Overview  















Interpixel Interpix el Redundancy 

Related to interpixel correlation within an image.



The value of a pixel in the image can be reasonably predicted from the values of its neighbours.



The gray levels of neighboring pixels are roughly the same and by knowing gray level value of one of the neighborhood pixels one has a lot of information about gray levels of other neighborhood pixels.



Information carried by individual pixels is relatively small. These dependencies between values of pixels in the image are called interpixel redundancy .



Autocorrelation



The autocorrelation coefficients along a single line of image are computed as  (n)



A(n) A(0)

where A(n)  

1 N  n

For the entire image,

N 1 n



y  0

f ( x, y ) f ( x, y  n)



To reduce interpixel redundancy, transform it into an efficient format.



Example: The differences between adjacent pixels can be used to represent the image.



Transformations that remove interpixel redundancies are termed as mappings.



If original image can be reconstructed from the dataset, these mappings are called reversible mappings.

Overview  















Psychovisual Ps ychovisual Redundancy 

Based on human perception perception



Associated with real or quantifiable visual information.



Elimination of psychovisual redundancy results in loss of quantitative information. This is referred to as quantization.



Quantization – mapping of a broad range of input values to a limited number of output values. 

Results in lossy data compression.

Overview  















Fidelity Criteria 

Objective fidelity criteria



When the level of information loss can be expressed as a function of original (input) image and the compressed and subsequently decompressed output image. 

Example: Root Mean Square error between input and output images. 

e( x, y )  f ( x, y)  f ( x, y)

1

erms

2 2  1 M 1 N 1     f ( x, y )  f ( x, y)       MN    x y 0 0   



Mean Square Signal-to-Noise Signal-to-Noise Ratio M 1 N 1 



SNRms 

f ( x, y) 2

x  0 y  0 M 1 N 1

 f ( x, y)  f ( x, y)        x  0 y  0 

2



Subjective fidelity criteria



Measures image quality by subjective evaluations of a human observer o bserver..

Overview  















Image Compression Models



Encoder – Source encoder + Channel encoder



Source encoder – removes coding, interpixel, and psychovisual redundancies in input image and outputs a set of symbols.



Channel encoder – To increase the noise immunity of the output of source encoder.



Decoder - Channel decoder + Source decoder





Source Encoder

Mapper 

 



Transforms input data into a format designed to reduce interpixel redundancies in input image. Reversible process generally May or may not reduce directly the amount of data required to represent the image. Examples 

Run-length coding(directly results in data compression)



Transform coding



Quantizer



Reduces the accuracy of the mapper’s output in accordance with some pre-established fidelity criterion.



Reduces the psychovisual redundancies of the input image.



Irreversible process (irreversible information loss)



Must be omitted when error-free compression is desired.



Symbol encoder



Creates a fixed- or variable-length code to represent the quantizer output and maps the output in accordance with the code.



Usually, a variable-length code is used to represent the mapped and quantized output. 





Assigns the shortest codewords to the most frequently occuring output values. Reduces coding redundancy. redundancy.

Reversible process



Source decoder



Symbol decoder



Inverse Mapper



Inverse operations are performed in the reverse order.



Channel Encoder and Decoder



Essential when the channel is noisy or error-prone.



Source encoded data – highly sensitive to channel noise.



Channel encoder reduces the impact of channel noise by inserting controlled form of redundancy into the source encoded data.



Example 

Hamming Code – Code – appends enough bits to the data being encoded to ensure that two valid codewords differ by a minimum number of bits.



7-bit Hamming(7,4) Code    



7-bit codewords 4-bit word 3 bits of redundancy Distance between two valid codewords (the minimum number of bit changes required to change from one code to another) is 3. All single-bit errors can be detected and corrected.

Hamming distance between two codewords is the number of places where the codewords differ.  Minimum Distance of a code is the minimum number of bit changes between any two codewords.  Hamming weight of a codeword is equal to the number of non-zero elements (1 ’s) in the codeword. 

Binary data b3b2b1b0

Hamming Codeword h1h2h3h4h5h6h7

0000

0000000

0001

1101001

0010

0101010

0011

1000011

0100

1001100

0101

0100101

0110

1100110

0111

0001111

Overview  















Basics of Probability

Ref: http://en.wikipedia.or http://en.wikipedia.org/wiki/Probabi g/wiki/Probability lity

Ref: http://en.wikipedia.or http://en.wikipedia.org/wiki/Probabi g/wiki/Probability lity

Ref: http://en.wikipedia.org http://en.wikipedia.org/wiki/Probabi /wiki/Probability lity

Elements of Information Theory 

Measuring Information



A random event E occuring with probability P(E) is said to contain

I( E)  log

1

P( E )

  log( P( E))



units of information.



I(E) is called the self-information of E.



Amount of self-information of an event E is inversely related to its probability.



If P(E) = 1, I(E) = 0. That is, there is no uncertainty associated with the event. 

No information is conveyed because it is certain that the event will occur.



If base m logarithm is used, the measurement is in m-ary units.



If base is 2, the measurement is in binary units. The unit of information is called a bit.



If P(E) = ½, I(E) = -log (½) = 1 bit. That is, 1 bit of information is conveyed when one of the two possible equally likely outcomes occur.

Overview  















The Information Channel 

Information channel is the physical medium that connects the information source to the user of information.



Self-information is transferred between an information source and a user of the information, through the information channel.



Information source – Generates a random sequence of symbols from a finite or countably infinite set of possible symbols.



Output of the source is a discrete random variable.



The set of source symbols or letters{a 1, a2, …, aJ} is referred to as the source alphabet A.



The probability of the event that the source will produce symbol a j is P(a j). J

 P(a )  1 j

j 1





T

z   P(a1 ), P(a2 ), ..., P(a J ) The Jx1 vector is used to represent the set of all source symbol probabilities.

The finite ensemble (A, z) describes the information source completely.



The probability that the discrete source will emit symbol a j is P(a j).



Therefore, the self-information generated production of a single source symbol is,







by

the

I( a j )   log P( aj ) If k source symbols are generated, the average selfinformation obtained from k outputs is

kP(a1 ) log P(a1 )  kP(a2 ) log P(a2 )  ...  kP (a J ) log P (a J ) J

  k  P(a j ) log P(a j ) j 1



The average information per source output, denoted as H(z), is

H (z )  E[ I( z)] 

J

 P( a ) I( a ) j

j

j 1 J

  P(a ) loj g j 1 

1 P(a j )

J

   P(a ) loj g P( a )

j

j 1

This is called the uncertainty or entropy of the source.



It is the average amount of information (in m-ary units per symbol) obtained by observing a single source output.



If the source symbols are equally probable, the entropy is maximized and the source provides maximum possible average information per source symbol.



A simple information system



Output of the channel is also a discrete random variable which takes on values from a finite or countably infinite set of symbols {b 1, b2, …, bK} called the channel alphabet B.



The finite ensemble (B, v), where v   P(b1 ), P(b2 ), ..., P(b J )T describes the channel output completely and thus the information received by the user.



The probability P(b k ) of a given channel output and the probability distribution of the source z are related as

P (bk ) 

J

 P (b

k

| a j ) P( a j )

j 1

where P(bk | a j ) is the conditional probability that the the out outpu putt symb symbol ol bk is rec receive ived , giv given tha thatt the the sour source ce symb symbol ol a j was was gene genera rate ted d .



Forward Channel Transition Matrix or Channel Matrix

 P  b1 | a1  P  b1 | a2   P  b2 | a1  P  b2 | a2   Q  : :   P  bK | a1  P  bK | a2   



Matrix element, qkj

...

P  b1 | a J  

 ... P  b2 | a J    .. ... :  ... P  bK | aJ  

 P  bk | a j 

The probability distribution of the output alphabet can be computed from v = Qz



Conditional entropy function Entropy

H (z )  E[ I ( z)] 

J

J

 P( a ) I ( a )   P( a ) log P( a ) j

j

j

j 1

j

j 1

Conditional al entropy entropy function function  Condition H (z | bk )  E[ I ( z | bk )] 

J

 P( a

j

| bk ) I ( a j | bk )

j 1 J

   P (a j | b k) log P(a j | b k)

j 1

where P ( a | bj ) isk the probability that symbol a is j transmitt ed by the source given that the user receives bk .



The expected or average value over all b k is

H(z | v) 

K

 H(z | b ) P( b ) k

k

k 1

 J      P(a j | b k) log P(a j | b k)  P(b k) k 1  j 1  K

K

J

    P(a j | b k) P(b k) log P (a j | b k) k 1 j 1

Cond Condit itio iona nall Prob Probab abil ilit ity y, P(a j | bk )  K

P (a j , bk ) P(bk )

J

 H(z | v)     P( a j, b k) log P( a j | b k) k 1 j 1





P(a j,bk ) is the joint probability of a j and bk . That is, the probability that a j is transmitted and b k is received.

Mutual information information



H(z) is the average information per source symbol, assuming no knowledge of the output symbol.



H(z|v) is the average information per source symbol, assuming observation observation of the output symbol.



The difference between H(z) and H(z|v) is the average information received upon observing the output symbol, and is called the mutual information of z and v, given by



I(z|v) = H(z) - H(z|v)

I(z | v)  H( z)  H( z | v) J     P(a j) log P(a  1 j J

J K    )     P( a j, b k) log P( a j| b k)  j    1 j 1k  J

K

   P(a j) log P(a j)    P( a j, b k) log P( a j | b k) 1j

1j 1k

P(a )j  P (a ,j b1 )  P (a ,j b2 )  ...  P (a ,j b K) K

  P(a j , bk ) k 1

I (z | v)  

J

K

J

K

  P( a , b ) log P( a )    P( a , b ) log P( a j

j1 J

k

k1

K

   P(a j , bk ) log j 1 k 1 J

K

   P(a j , bk ) log j 1 k 1

j

j

j1

P (a j | bk ) P (a j ) P (a j , bk ) P (a j ) P (bk )

k1

k

j

| b k)

P (a j, b k)  P (a j | b k).P (b k)

P (a j, b k)  P (b k | a j).P (a j) I(z | v ) 

J

K

  P( b | a ). P( a ) log k

j

j

j 1 k 1 J

K

   qkj .P (a j ) lo log j 1 k 1 J

K

   qkj .P (a j ) lo log j 1 k 1 J

K

   qkj .P (a j ) lo log j 1 k 1

P (bk | a j ).P (a j ) P (a j ) P (bk )

qkj .P (a j ) P (a j ) P (bk ) qkj P (bk ) qkj P (bk )

P(bk ) 

J

 P(b

k

| a j ) P (a j )

j 1

I(z | v) 

J

qkj

K

  q . P( a ) log kj

j

j 1 k 1

J

 P(b

k

| ai ) P (ai )

i 1 J

qkj

K

log    qkj .P(a j ) lo j 1 k 1

J

q

ki

i 1

P (ai )



The minimum possible value of I( z|v) is zero. 



Occurs when the input and output symbols are statistically independent. That is, when P(a j,bk ) = P(a j)P(bk ).

I( z | v ) 

J

P (a j , bk )

K

  P(a , b ) log P(a ) P(b ) j

k

j 1 k 1 J

K

   P (a j , bk ) log j 1 k 1 J

K

j

P (a j ) P (bk ) P (a j ) P (bk )

   P (a j , bk ) log 1  0 j 1 k 1

k



Channel Capacity



The maximum value of I( z|v) over all possible choices of source probabilities in the vector z is called the capacity, C, of the channel described by channel matrix Q. C  max[I [I((z | v)] z



Channel capacity is the maximum rate at which information can be transmitted reliably through the channel.



Binary information source



Binary Symmetric Channel (BSC)



Binary Information Source

Source alphabet A  {a1 , a2 }  0, 1 P  a1  pbs , P  a 2   1 - pbs  p bs Entr Entrop opy y of sourc source e, H(z )   pbs log 2 pbs  pbs log 2 pbs T

T

where z   P  a1 , P  a 2     pbs ,1 - pbs 

  bsplog 2 

p

bs

plog 2

bs

p

bs

called the binary en entropy  is ca

fun funct ctio ion n deno denote ted d as H bs (.) For example, H bs (t )  t log 2 t  t log 2 t



Binary Symmetric Channel (Noisy Binary Information Channel) Let Let the the pro proba babi bilit lityy of erro errorr durin during g tran transm smiss ission ion

of any symbol be pe . Chan Channe nell matr matrix ix for for BSC BSC

 P(b1 | a1 ) P(b1 | a2 )  Q  ( | ) ( | ) P b a P b a 2 2   2 1 |1)   P (0 | 0) P(0 |1)   ( 1| 0 ) ( 1|1) 1|1 ) P P    p e pe  1  pe     pe 1  pe   pe

pe 

 p e 

Output alphabet B  {b1 , b 2 }  0, 1 T

v   P  b1  , P  b 2     P  0  , P 1

T

The probab probabili ilitie tiess of the rece receiv iving ing output output symbo symbols ls b1 and b2 can be determi rmined ned by, v  Qz

 p e pe   pbs  =    pe p e   p bs   P(0)  p e pbs  pe pbs P (1)  pe pbs  p e pbs

 

The mutual information of BSC can be computed as 2 2 qkj I(z | v)  qkj . P( aj ) log 2 2 j 1 k 1 qki P( ai )



 i 1

 q11.P( a1 ) lloog 2

q11 q11 P(a1 )  q12 P( a2 )

 q21.P( a1 ) lloog 2  q12 .P(a2 ) lo log 2  q22 .P(a2 ) lo log 2

q21 q21 P( a1 )  q22 P( a2 ) q12 q11 P(a1 )  q12 P( a2 ) q22 q21 P( a1 )  q22 P( a2 )

 pe . pbs log 2  pe . pbs log 2

p e pe pbs  pe pbs pe pe pbs  pe pbs

 pe . pbs log 2  pe . pbs log 2

pe pe pbs  pe pbs p e pe pbs  pe pbs

 pe . pbs log 2 pe  pe . pbs log 2  pe pbs  pe pbs   pe . pbs log 2 pe  pe . pbs log 2  pe pbs  pe pbs   pe . pbs log 2 pe  pe . pbs log 2  pe pbs  pe pbs   pe . pbs log 2 pe  pe . pbs log 2  pe pbs  pe pbs   Hbs ( pe pbs  pe pbs )  Hbs ( pe ) where H bs (.)    pbs log 2 pbs  p bs log 2 p bs 







Capacity of BSC 

Maximum of mutual information over all source distributions. T

1 1 1  (Iz | v) ismax imum when p toz   ,  . bs is . This corresponds to 2 2 2

 I(z | v)  Hbs ( pe  Hbs ( pe

1 2 1 2

1

 pe )  Hbs ( pe ) 2

1

 (1  pe ) )  Hbs ( pe ) 2

1  Hbs    Hbs ( pe ) 2 1

1

1

2 2 2  1  H bs ( pe )

2

  log 2

1

 log 2

 H bs ( pe )

Overview  















Fundamental Fundament al Coding Theorems



The Noiseless Coding Theorem or Shannon’s First Theorem or Shannon’s Source Coding Theorem for Lossless Data Compression 







When both the information channel and communication system are error-free Defines the minimum average codeword length per source symbol that can be achieved. Aim: to represent source as compact as possible.

Let the information source (A, z), with statistically independent source symbols, output an n-tuple of symbols from source alphabet A. Then, the source output takes on one of the Jn possible values, denoted by, αi , from A'  {1 ,  2 ,  3 , ,  J n }

Probability of a given  i , P ( i ) is related to single  symbol probabil probabilitie itiess as P( i )  P (a j1 ) P (a j 2 ) ... P (a jn ) z '  {P (1 ), P( 2 ), ..., P( J n )}

Ent Entro ropy py of the the sour source ceis is give given n by H (z ')  

J n



P ( i ) log P ( i )

i 1 J n

    P(a 1 )j P (a 2 )j ... P (a )jn log  P (a 1 )j P(a 2 )j ... P (a )jn i 1

H (z ') 

nH (z )



Hence, the entropy of the zero-memory source is n times the entropy of the corresponding single symbol source. Such a source is called the n th extension of single-symbol source.

Self information of

log

1 P( i )

 i

is log

 l ( i )  log

1 P( i )

1 P ( i )

.

1

αi is ther theref efor oree repr repres esen ente ted d by a code codewo word rd who whose sel l ength ength is the the smal smalle lest st inte intege gerr excee ceeding ding the the self self - info inform rmat atio ion n of αi .

P ( i ) log n

1 P ( i )

J

 P ( i )l ( i )  P( i ) log n

1

n

J

1 P( i ) J

 P( i )

1

 P( ) log P( )   P( )l ( )   P( ) log P( )  1 i

i

i 1

i

i 1

J n

 P( )l ( ) i

i

i 1

H (z ') n



H (z ) 

L 'avg n

L 'avg n



H (z ')  1 n

 H (z ) 

 L 'avg   H (z ) lim   n   n 

1 n

i

i 1

H(z ')  L'avg  H( z ' )  1 where L 'avg 

i

i



Shannon’s source coding theorem for lossless data compression states that for any code used to represent the symbols from a source, the minimum number of bits required to represent the source symbols on an average must be atleast equal to the entropy of the source. H (z) 

L 'avg n

 H ( z) 

1 n

The The effi effici cien ency cy of any enco encodi ding ng stra strate tegy gy can can be defi define ned d as  



nH (z ) L 'avg H ( z ') L 'avg



The Noisy Coding Theorem or Theorem  

Shannon’s

Second

When the channel is noisy or prone to error Aim: to encode information so that the communication is made reliable and the error is minimized.



Use of repetitive coding scheme



Encode nth extension of source using K-ary code sequences of length r, K r ≤ Jn.



Select only φ of the Kr possible code sequences as valid codewords.



A zero-memory information source generates information at a rate equal to its entropy.



The nth extension of the source provides information at a rate of H (z ') information units per symbol. n



If the information is coded, the maximum rate of coded information is log( φ /r) and occurs when the φ valid codewords used to code the source are equally probable.



Hence, a code of size φ and block length r is said to have a rate of  R  log r information units per symbol.

 



The noisy coding theorem thus states that for any R0.



That is, the probability of error can be made arbitrarily small so long as the coded message rate is less than the capacity of the channel.



The Source Compression 

 



Coding

Theorem

for

Lossy

Data

When channel is error-free, but communication process is lossy. Aim: information compression To determine the smallest rate at which information about the source can be conveyed to the user. user. To encode the source so that the average distortion is less than a maximum allowable level D.

Let the information source and ecoder output be defined by (A,z) and (B,v) respectively.  A nonnegative cost function ρ(a j,bk ), called distortion measure, is used to define the penalty associated with reproducing source output a j with decoder output b k . 

Averagevalueof distortionis givenby d (Q) 

J

K

   (a , b ) P(a , b ) j

k

j

k

j 1 k 1 J

K

    (a j, b k) P(a j)q kj j 1 k 1

whe where Q is the the chann hanneel matr atrix. Rat Ratee dist distor orti tion on func functi tion on R( D) is defi define ned d as R( D)  min  I(z, v ) QQ D

where Q D {q k|j d (Q)  D} is th the se set o f all D adm admissi issibl blee encod ncodin ing g deco decodi ding ng proc proceedure duress.







If D = 0, R(D) is less than or equal to the entropy of the source, or R(0)≤H(z).

R( D)  min  I( z, v) 

defines the minimum rate at which information can be conveyed to user subject to the constraint that the average distortion be less than or equal to D. K I(z,v) is minimized subject to:qkj  0, qkj  1, and d (Q)  D QQ D

 k 1



d(Q) = D indicates that the minimum information rate occurs when the maximum possible distortion is allowed.



Shannon’s Source Coding Theorem for Lossy Data or a given source (with all its Compression states that f statistical properties known) and a given distortion measure, there is a function, R(D), called the rate distortion function such that if D is the tolerable amount of distortion, then R(D) is the best possible compression rate.



The theory of lossy data compression is also known as rate distortion theory.



The lossless data compression theory and lossy data compression theory are collectively known as the source coding theory .

Thank You

Image Compression Fundamentals

Recommend Documents