Resmi N.G N.G.. Reference: Digital Image Processing 2nd Edition Rafael C. Gonzalez Richard E. Woods
Overview
Introduction Fundamentals
Image Compression Models
Coding Redundancy Interpi xel Redundancy Psychovisual Redundancy Fidelity Criteria Source Encoder and Decoder Channel Encoder and Decoder
Elementss of Information Theory Element
Measuring Information The Information Channel Fundamental Coding Theorems
Noiseless Coding Theorem Noisy Coding Theorem Source Coding Theorem
Error-Free Compression
Variable-Length Coding
LZW Coding Bit-Plane Coding
Huffman Coding Huffman Other Near Optimal Variable Length Codes Arithmetic Coding
Bit-Plane Decomposition Constant Area Coding One-Dimensional Run-Length Coding Two-Dimensional Run-Length Coding
Lossless Predictive Coding
Lossy Compressi Compression on
Lossy Predictive Coding
Transform Coding
Transform Selection
Subimage Size Selection
Bit Allocation
Zonal Coding Implementation
Threshold Coding Implementation Implementation
Wavelet Coding
Wavelet Selection
Decomposition Level Selection
Quantizer Design
Image Compression Standards
Binary Image Compression Standards
One Dimensional Compression
Two Dimensional Compression
Continuous Tone Still Image Compression Standards
JPEG
Lossy Baseline Coding System
Extended Coding System
Lossless Independent Coding System
JPEG 2000
Video Compression Standards
Introduction
Need for Compression
Huge amount of digital data
Difficult to store and transmit
Solution
Reduce the amount of data required to represent a digital image
Remove redundant data
Transform the data prior to storage and transmission
Categories
Information Preserving
Lossy Compression
Fundamentals
Data compression
Difference between data and information
Data Redundancy
If n1 and n2 denote the number of information-carrying units in two datasets that represent the same information, the relative data redundancy R D of the first dataset is defined as 1 R D 1
C R
where, C R
,
n1 n2
, is called the compression ratio.
Case1 : n2 n1 C R 1 and RD 0 no redundant data Case 2 : n2 n1 C R and RD 1 highly redundant data significant compression Case 3 : n2 n1 C R 0 and RD second dataset contains more data than the original
Overview
Introduction Fundamentals
Image Compression Models
Coding Redundancy Interpi xel Redundancy Psychovisual Redundancy Fidelity Criteria Source Encoder and Decoder Channel Encoder and Decoder
Elementss of Information Theory Element
Measuring Information The Information Channel Fundamental Coding Theorems
Noiseless Coding Theorem Noisy Coding Theorem Source Coding Theorem
Coding Redundancy
Let a discrete random variable r k in [0,1] represent the graylevels of an image. pr(rk ) denotes the probability of occurrence of r k .
pr ( k r)
nk n
, k 0,1, 2, ... L 1
If the number of pixels used to represent each value of r k is l(rk ), then the average number of bits required to represent each pixel is
Lavg
L 1
l( r ) p ( r ) k
k 0
r
k
Hence, the total number of bits required to code an MxN image is MNLavg.
For representing an image using an m-bit binary code, Lavg= m.
How to achieve data compression? Variable length coding - Assign fewer bits to the more probable graylevels graylevels than to the less probable ones.
Find Lavg, compression ratio and redundancy.
Overview
Introduction Fundamentals
Image Compression Models
Coding Redundancy Interpi xel Redundancy Psychovisual Redundancy Fidelity Criteria Source Encoder and Decoder Channel Encoder and Decoder
Elementss of Information Theory Element
Measuring Information The Information Channel Fundamental Coding Theorems
Noiseless Coding Theorem Noisy Coding Theorem Source Coding Theorem
Interpixel Interpix el Redundancy
Related to interpixel correlation within an image.
The value of a pixel in the image can be reasonably predicted from the values of its neighbours.
The gray levels of neighboring pixels are roughly the same and by knowing gray level value of one of the neighborhood pixels one has a lot of information about gray levels of other neighborhood pixels.
Information carried by individual pixels is relatively small. These dependencies between values of pixels in the image are called interpixel redundancy .
Autocorrelation
The autocorrelation coefficients along a single line of image are computed as (n)
A(n) A(0)
where A(n)
1 N n
For the entire image,
N 1 n
y 0
f ( x, y ) f ( x, y n)
To reduce interpixel redundancy, transform it into an efficient format.
Example: The differences between adjacent pixels can be used to represent the image.
Transformations that remove interpixel redundancies are termed as mappings.
If original image can be reconstructed from the dataset, these mappings are called reversible mappings.
Overview
Introduction Fundamentals
Image Compression Models
Coding Redundancy Interpi xel Redundancy Psychovisual Redundancy Fidelity Criteria Source Encoder and Decoder Channel Encoder and Decoder
Elementss of Information Theory Element
Measuring Information The Information Channel Fundamental Coding Theorems
Noiseless Coding Theorem Noisy Coding Theorem Source Coding Theorem
Psychovisual Ps ychovisual Redundancy
Based on human perception perception
Associated with real or quantifiable visual information.
Elimination of psychovisual redundancy results in loss of quantitative information. This is referred to as quantization.
Quantization – mapping of a broad range of input values to a limited number of output values.
Results in lossy data compression.
Overview
Introduction Fundamentals
Image Compression Models
Coding Redundancy Interpi xel Redundancy Psychovisual Redundancy Fidelity Criteria Source Encoder and Decoder Channel Encoder and Decoder
Elementss of Information Theory Element
Measuring Information The Information Channel Fundamental Coding Theorems
Noiseless Coding Theorem Noisy Coding Theorem Source Coding Theorem
Fidelity Criteria
Objective fidelity criteria
When the level of information loss can be expressed as a function of original (input) image and the compressed and subsequently decompressed output image.
Example: Root Mean Square error between input and output images.
e( x, y ) f ( x, y) f ( x, y)
1
erms
2 2 1 M 1 N 1 f ( x, y ) f ( x, y) MN x y 0 0
Mean Square Signal-to-Noise Signal-to-Noise Ratio M 1 N 1
SNRms
f ( x, y) 2
x 0 y 0 M 1 N 1
f ( x, y) f ( x, y) x 0 y 0
2
Subjective fidelity criteria
Measures image quality by subjective evaluations of a human observer o bserver..
Overview
Introduction Fundamentals
Image Compression Models
Coding Redundancy Interpi xel Redundancy Psychovisual Redundancy Fidelity Criteria Source Encoder and Decoder Channel Encoder and Decoder
Elementss of Information Theory Element
Measuring Information The Information Channel Fundamental Coding Theorems
Noiseless Coding Theorem Noisy Coding Theorem Source Coding Theorem
Image Compression Models
Encoder – Source encoder + Channel encoder
Source encoder – removes coding, interpixel, and psychovisual redundancies in input image and outputs a set of symbols.
Channel encoder – To increase the noise immunity of the output of source encoder.
Decoder - Channel decoder + Source decoder
Source Encoder
Mapper
Transforms input data into a format designed to reduce interpixel redundancies in input image. Reversible process generally May or may not reduce directly the amount of data required to represent the image. Examples
Run-length coding(directly results in data compression)
Transform coding
Quantizer
Reduces the accuracy of the mapper’s output in accordance with some pre-established fidelity criterion.
Reduces the psychovisual redundancies of the input image.
Irreversible process (irreversible information loss)
Must be omitted when error-free compression is desired.
Symbol encoder
Creates a fixed- or variable-length code to represent the quantizer output and maps the output in accordance with the code.
Usually, a variable-length code is used to represent the mapped and quantized output.
Assigns the shortest codewords to the most frequently occuring output values. Reduces coding redundancy. redundancy.
Reversible process
Source decoder
Symbol decoder
Inverse Mapper
Inverse operations are performed in the reverse order.
Channel Encoder and Decoder
Essential when the channel is noisy or error-prone.
Source encoded data – highly sensitive to channel noise.
Channel encoder reduces the impact of channel noise by inserting controlled form of redundancy into the source encoded data.
Example
Hamming Code – Code – appends enough bits to the data being encoded to ensure that two valid codewords differ by a minimum number of bits.
7-bit Hamming(7,4) Code
7-bit codewords 4-bit word 3 bits of redundancy Distance between two valid codewords (the minimum number of bit changes required to change from one code to another) is 3. All single-bit errors can be detected and corrected.
Hamming distance between two codewords is the number of places where the codewords differ. Minimum Distance of a code is the minimum number of bit changes between any two codewords. Hamming weight of a codeword is equal to the number of non-zero elements (1 ’s) in the codeword.
Binary data b3b2b1b0
Hamming Codeword h1h2h3h4h5h6h7
0000
0000000
0001
1101001
0010
0101010
0011
1000011
0100
1001100
0101
0100101
0110
1100110
0111
0001111
Overview
Introduction Fundamentals
Image Compression Models
Coding Redundancy Interpi xel Redundancy Psychovisual Redundancy Fidelity Criteria Source Encoder and Decoder Channel Encoder and Decoder
Elementss of Information Theory Element
Measuring Information The Information Channel Fundamental Coding Theorems
Noiseless Coding Theorem Noisy Coding Theorem Source Coding Theorem
Basics of Probability
Ref: http://en.wikipedia.or http://en.wikipedia.org/wiki/Probabi g/wiki/Probability lity
Ref: http://en.wikipedia.or http://en.wikipedia.org/wiki/Probabi g/wiki/Probability lity
Ref: http://en.wikipedia.org http://en.wikipedia.org/wiki/Probabi /wiki/Probability lity
Elements of Information Theory
Measuring Information
A random event E occuring with probability P(E) is said to contain
I( E) log
1
P( E )
log( P( E))
units of information.
I(E) is called the self-information of E.
Amount of self-information of an event E is inversely related to its probability.
If P(E) = 1, I(E) = 0. That is, there is no uncertainty associated with the event.
No information is conveyed because it is certain that the event will occur.
If base m logarithm is used, the measurement is in m-ary units.
If base is 2, the measurement is in binary units. The unit of information is called a bit.
If P(E) = ½, I(E) = -log (½) = 1 bit. That is, 1 bit of information is conveyed when one of the two possible equally likely outcomes occur.
Overview
Introduction Fundamentals
Image Compression Models
Coding Redundancy Interpi xel Redundancy Psychovisual Redundancy Fidelity Criteria Source Encoder and Decoder Channel Encoder and Decoder
Elementss of Information Theory Element
Measuring Information The Information Channel Fundamental Coding Theorems
Noiseless Coding Theorem Noisy Coding Theorem Source Coding Theorem
The Information Channel
Information channel is the physical medium that connects the information source to the user of information.
Self-information is transferred between an information source and a user of the information, through the information channel.
Information source – Generates a random sequence of symbols from a finite or countably infinite set of possible symbols.
Output of the source is a discrete random variable.
The set of source symbols or letters{a 1, a2, …, aJ} is referred to as the source alphabet A.
The probability of the event that the source will produce symbol a j is P(a j). J
P(a ) 1 j
j 1
T
z P(a1 ), P(a2 ), ..., P(a J ) The Jx1 vector is used to represent the set of all source symbol probabilities.
The finite ensemble (A, z) describes the information source completely.
The probability that the discrete source will emit symbol a j is P(a j).
Therefore, the self-information generated production of a single source symbol is,
by
the
I( a j ) log P( aj ) If k source symbols are generated, the average selfinformation obtained from k outputs is
kP(a1 ) log P(a1 ) kP(a2 ) log P(a2 ) ... kP (a J ) log P (a J ) J
k P(a j ) log P(a j ) j 1
The average information per source output, denoted as H(z), is
H (z ) E[ I( z)]
J
P( a ) I( a ) j
j
j 1 J
P(a ) loj g j 1
1 P(a j )
J
P(a ) loj g P( a )
j
j 1
This is called the uncertainty or entropy of the source.
It is the average amount of information (in m-ary units per symbol) obtained by observing a single source output.
If the source symbols are equally probable, the entropy is maximized and the source provides maximum possible average information per source symbol.
A simple information system
Output of the channel is also a discrete random variable which takes on values from a finite or countably infinite set of symbols {b 1, b2, …, bK} called the channel alphabet B.
The finite ensemble (B, v), where v P(b1 ), P(b2 ), ..., P(b J )T describes the channel output completely and thus the information received by the user.
The probability P(b k ) of a given channel output and the probability distribution of the source z are related as
P (bk )
J
P (b
k
| a j ) P( a j )
j 1
where P(bk | a j ) is the conditional probability that the the out outpu putt symb symbol ol bk is rec receive ived , giv given tha thatt the the sour source ce symb symbol ol a j was was gene genera rate ted d .
Forward Channel Transition Matrix or Channel Matrix
P b1 | a1 P b1 | a2 P b2 | a1 P b2 | a2 Q : : P bK | a1 P bK | a2
Matrix element, qkj
...
P b1 | a J
... P b2 | a J .. ... : ... P bK | aJ
P bk | a j
The probability distribution of the output alphabet can be computed from v = Qz
Conditional entropy function Entropy
H (z ) E[ I ( z)]
J
J
P( a ) I ( a ) P( a ) log P( a ) j
j
j
j 1
j
j 1
Conditional al entropy entropy function function Condition H (z | bk ) E[ I ( z | bk )]
J
P( a
j
| bk ) I ( a j | bk )
j 1 J
P (a j | b k) log P(a j | b k)
j 1
where P ( a | bj ) isk the probability that symbol a is j transmitt ed by the source given that the user receives bk .
The expected or average value over all b k is
H(z | v)
K
H(z | b ) P( b ) k
k
k 1
J P(a j | b k) log P(a j | b k) P(b k) k 1 j 1 K
K
J
P(a j | b k) P(b k) log P (a j | b k) k 1 j 1
Cond Condit itio iona nall Prob Probab abil ilit ity y, P(a j | bk ) K
P (a j , bk ) P(bk )
J
H(z | v) P( a j, b k) log P( a j | b k) k 1 j 1
P(a j,bk ) is the joint probability of a j and bk . That is, the probability that a j is transmitted and b k is received.
Mutual information information
H(z) is the average information per source symbol, assuming no knowledge of the output symbol.
H(z|v) is the average information per source symbol, assuming observation observation of the output symbol.
The difference between H(z) and H(z|v) is the average information received upon observing the output symbol, and is called the mutual information of z and v, given by
I(z|v) = H(z) - H(z|v)
I(z | v) H( z) H( z | v) J P(a j) log P(a 1 j J
J K ) P( a j, b k) log P( a j| b k) j 1 j 1k J
K
P(a j) log P(a j) P( a j, b k) log P( a j | b k) 1j
1j 1k
P(a )j P (a ,j b1 ) P (a ,j b2 ) ... P (a ,j b K) K
P(a j , bk ) k 1
I (z | v)
J
K
J
K
P( a , b ) log P( a ) P( a , b ) log P( a j
j1 J
k
k1
K
P(a j , bk ) log j 1 k 1 J
K
P(a j , bk ) log j 1 k 1
j
j
j1
P (a j | bk ) P (a j ) P (a j , bk ) P (a j ) P (bk )
k1
k
j
| b k)
P (a j, b k) P (a j | b k).P (b k)
P (a j, b k) P (b k | a j).P (a j) I(z | v )
J
K
P( b | a ). P( a ) log k
j
j
j 1 k 1 J
K
qkj .P (a j ) lo log j 1 k 1 J
K
qkj .P (a j ) lo log j 1 k 1 J
K
qkj .P (a j ) lo log j 1 k 1
P (bk | a j ).P (a j ) P (a j ) P (bk )
qkj .P (a j ) P (a j ) P (bk ) qkj P (bk ) qkj P (bk )
P(bk )
J
P(b
k
| a j ) P (a j )
j 1
I(z | v)
J
qkj
K
q . P( a ) log kj
j
j 1 k 1
J
P(b
k
| ai ) P (ai )
i 1 J
qkj
K
log qkj .P(a j ) lo j 1 k 1
J
q
ki
i 1
P (ai )
The minimum possible value of I( z|v) is zero.
Occurs when the input and output symbols are statistically independent. That is, when P(a j,bk ) = P(a j)P(bk ).
I( z | v )
J
P (a j , bk )
K
P(a , b ) log P(a ) P(b ) j
k
j 1 k 1 J
K
P (a j , bk ) log j 1 k 1 J
K
j
P (a j ) P (bk ) P (a j ) P (bk )
P (a j , bk ) log 1 0 j 1 k 1
k
Channel Capacity
The maximum value of I( z|v) over all possible choices of source probabilities in the vector z is called the capacity, C, of the channel described by channel matrix Q. C max[I [I((z | v)] z
Channel capacity is the maximum rate at which information can be transmitted reliably through the channel.
Binary information source
Binary Symmetric Channel (BSC)
Binary Information Source
Source alphabet A {a1 , a2 } 0, 1 P a1 pbs , P a 2 1 - pbs p bs Entr Entrop opy y of sourc source e, H(z ) pbs log 2 pbs pbs log 2 pbs T
T
where z P a1 , P a 2 pbs ,1 - pbs
bsplog 2
p
bs
plog 2
bs
p
bs
called the binary en entropy is ca
fun funct ctio ion n deno denote ted d as H bs (.) For example, H bs (t ) t log 2 t t log 2 t
Binary Symmetric Channel (Noisy Binary Information Channel) Let Let the the pro proba babi bilit lityy of erro errorr durin during g tran transm smiss ission ion
of any symbol be pe . Chan Channe nell matr matrix ix for for BSC BSC
P(b1 | a1 ) P(b1 | a2 ) Q ( | ) ( | ) P b a P b a 2 2 2 1 |1) P (0 | 0) P(0 |1) ( 1| 0 ) ( 1|1) 1|1 ) P P p e pe 1 pe pe 1 pe pe
pe
p e
Output alphabet B {b1 , b 2 } 0, 1 T
v P b1 , P b 2 P 0 , P 1
T
The probab probabili ilitie tiess of the rece receiv iving ing output output symbo symbols ls b1 and b2 can be determi rmined ned by, v Qz
p e pe pbs = pe p e p bs P(0) p e pbs pe pbs P (1) pe pbs p e pbs
The mutual information of BSC can be computed as 2 2 qkj I(z | v) qkj . P( aj ) log 2 2 j 1 k 1 qki P( ai )
i 1
q11.P( a1 ) lloog 2
q11 q11 P(a1 ) q12 P( a2 )
q21.P( a1 ) lloog 2 q12 .P(a2 ) lo log 2 q22 .P(a2 ) lo log 2
q21 q21 P( a1 ) q22 P( a2 ) q12 q11 P(a1 ) q12 P( a2 ) q22 q21 P( a1 ) q22 P( a2 )
pe . pbs log 2 pe . pbs log 2
p e pe pbs pe pbs pe pe pbs pe pbs
pe . pbs log 2 pe . pbs log 2
pe pe pbs pe pbs p e pe pbs pe pbs
pe . pbs log 2 pe pe . pbs log 2 pe pbs pe pbs pe . pbs log 2 pe pe . pbs log 2 pe pbs pe pbs pe . pbs log 2 pe pe . pbs log 2 pe pbs pe pbs pe . pbs log 2 pe pe . pbs log 2 pe pbs pe pbs Hbs ( pe pbs pe pbs ) Hbs ( pe ) where H bs (.) pbs log 2 pbs p bs log 2 p bs
Capacity of BSC
Maximum of mutual information over all source distributions. T
1 1 1 (Iz | v) ismax imum when p toz , . bs is . This corresponds to 2 2 2
I(z | v) Hbs ( pe Hbs ( pe
1 2 1 2
1
pe ) Hbs ( pe ) 2
1
(1 pe ) ) Hbs ( pe ) 2
1 Hbs Hbs ( pe ) 2 1
1
1
2 2 2 1 H bs ( pe )
2
log 2
1
log 2
H bs ( pe )
Overview
Introduction Fundamentals
Image Compression Models
Coding Redundancy Interpi xel Redundancy Psychovisual Redundancy Fidelity Criteria Source Encoder and Decoder Channel Encoder and Decoder
Elementss of Information Theory Element
Measuring Information The Information Channel Fundamental Coding Theorems
Noiseless Coding Theorem Noisy Coding Theorem Source Coding Theorem
Fundamental Fundament al Coding Theorems
The Noiseless Coding Theorem or Shannon’s First Theorem or Shannon’s Source Coding Theorem for Lossless Data Compression
When both the information channel and communication system are error-free Defines the minimum average codeword length per source symbol that can be achieved. Aim: to represent source as compact as possible.
Let the information source (A, z), with statistically independent source symbols, output an n-tuple of symbols from source alphabet A. Then, the source output takes on one of the Jn possible values, denoted by, αi , from A' {1 , 2 , 3 , , J n }
Probability of a given i , P ( i ) is related to single symbol probabil probabilitie itiess as P( i ) P (a j1 ) P (a j 2 ) ... P (a jn ) z ' {P (1 ), P( 2 ), ..., P( J n )}
Ent Entro ropy py of the the sour source ceis is give given n by H (z ')
J n
P ( i ) log P ( i )
i 1 J n
P(a 1 )j P (a 2 )j ... P (a )jn log P (a 1 )j P(a 2 )j ... P (a )jn i 1
H (z ')
nH (z )
Hence, the entropy of the zero-memory source is n times the entropy of the corresponding single symbol source. Such a source is called the n th extension of single-symbol source.
Self information of
log
1 P( i )
i
is log
l ( i ) log
1 P( i )
1 P ( i )
.
1
αi is ther theref efor oree repr repres esen ente ted d by a code codewo word rd who whose sel l ength ength is the the smal smalle lest st inte intege gerr excee ceeding ding the the self self - info inform rmat atio ion n of αi .
P ( i ) log n
1 P ( i )
J
P ( i )l ( i ) P( i ) log n
1
n
J
1 P( i ) J
P( i )
1
P( ) log P( ) P( )l ( ) P( ) log P( ) 1 i
i
i 1
i
i 1
J n
P( )l ( ) i
i
i 1
H (z ') n
H (z )
L 'avg n
L 'avg n
H (z ') 1 n
H (z )
L 'avg H (z ) lim n n
1 n
i
i 1
H(z ') L'avg H( z ' ) 1 where L 'avg
i
i
Shannon’s source coding theorem for lossless data compression states that for any code used to represent the symbols from a source, the minimum number of bits required to represent the source symbols on an average must be atleast equal to the entropy of the source. H (z)
L 'avg n
H ( z)
1 n
The The effi effici cien ency cy of any enco encodi ding ng stra strate tegy gy can can be defi define ned d as
nH (z ) L 'avg H ( z ') L 'avg
The Noisy Coding Theorem or Theorem
Shannon’s
Second
When the channel is noisy or prone to error Aim: to encode information so that the communication is made reliable and the error is minimized.
Use of repetitive coding scheme
Encode nth extension of source using K-ary code sequences of length r, K r ≤ Jn.
Select only φ of the Kr possible code sequences as valid codewords.
A zero-memory information source generates information at a rate equal to its entropy.
The nth extension of the source provides information at a rate of H (z ') information units per symbol. n
If the information is coded, the maximum rate of coded information is log( φ /r) and occurs when the φ valid codewords used to code the source are equally probable.
Hence, a code of size φ and block length r is said to have a rate of R log r information units per symbol.
The noisy coding theorem thus states that for any R0.
That is, the probability of error can be made arbitrarily small so long as the coded message rate is less than the capacity of the channel.
The Source Compression
Coding
Theorem
for
Lossy
Data
When channel is error-free, but communication process is lossy. Aim: information compression To determine the smallest rate at which information about the source can be conveyed to the user. user. To encode the source so that the average distortion is less than a maximum allowable level D.
Let the information source and ecoder output be defined by (A,z) and (B,v) respectively. A nonnegative cost function ρ(a j,bk ), called distortion measure, is used to define the penalty associated with reproducing source output a j with decoder output b k .
Averagevalueof distortionis givenby d (Q)
J
K
(a , b ) P(a , b ) j
k
j
k
j 1 k 1 J
K
(a j, b k) P(a j)q kj j 1 k 1
whe where Q is the the chann hanneel matr atrix. Rat Ratee dist distor orti tion on func functi tion on R( D) is defi define ned d as R( D) min I(z, v ) QQ D
where Q D {q k|j d (Q) D} is th the se set o f all D adm admissi issibl blee encod ncodin ing g deco decodi ding ng proc proceedure duress.
If D = 0, R(D) is less than or equal to the entropy of the source, or R(0)≤H(z).
R( D) min I( z, v)
defines the minimum rate at which information can be conveyed to user subject to the constraint that the average distortion be less than or equal to D. K I(z,v) is minimized subject to:qkj 0, qkj 1, and d (Q) D QQ D
k 1
d(Q) = D indicates that the minimum information rate occurs when the maximum possible distortion is allowed.
Shannon’s Source Coding Theorem for Lossy Data or a given source (with all its Compression states that f statistical properties known) and a given distortion measure, there is a function, R(D), called the rate distortion function such that if D is the tolerable amount of distortion, then R(D) is the best possible compression rate.
The theory of lossy data compression is also known as rate distortion theory.
The lossless data compression theory and lossy data compression theory are collectively known as the source coding theory .
Thank You